Ocean internal wave prediction method, device, equipment and storage medium
By extracting salinity and temperature as feature variables from historical marine environmental data, a machine learning model is trained to generate an internal wave prediction model. This solves the problems of insufficient data utilization and high computational cost in existing technologies for marine internal wave forecasting, and achieves efficient and interpretable internal wave forecasting, supporting risk warning for ship navigation and offshore platforms.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- EASY WEATHER (BEIJING) TECH CO LTD
- Filing Date
- 2026-02-04
- Publication Date
- 2026-06-26
Smart Images

Figure CN122286121A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of marine technology, and more specifically, to a method, apparatus, device, and storage medium for predicting internal ocean waves. Background Technology
[0002] Ocean internal waves are internal oscillations in the ocean caused by density stratification. They typically propagate near strata and carry significant kinetic energy and mass transport capacity. Propagating in deep waters, ocean internal waves can reach amplitudes of tens or even hundreds of meters, with periods ranging from minutes to hours, exhibiting strong nonlinearity and locality. They possess characteristics of covert propagation and sudden bursts, impacting underwater communications and ship navigation. Therefore, the prediction of ocean internal waves has become an important research topic in the field of ocean dynamics. Summary of the Invention
[0003] This application aims to provide a method, apparatus, device, and storage medium for forecasting ocean internal waves, with the goal of accurately and efficiently forecasting ocean phenomena. The application adopts the following approach: Firstly, embodiments of this application provide a method for forecasting ocean internal waves. This method includes: acquiring historical ocean environmental data at multiple spatiotemporal locations within a target area; generating an internal wave label dataset that spatiotemporally matches the historical ocean environmental data based on historical satellite internal wave observation data, through spatial buffer analysis and rasterization processing, wherein each internal wave label indicates the presence of internal waves at the corresponding spatiotemporal location; determining feature variables for model training from the historical ocean environmental data, including salinity and temperature; and training a machine learning model based on the historical data of the feature variables and their corresponding internal wave labels to obtain an internal wave prediction model for predicting the probability of internal wave occurrence.
[0004] In some embodiments, the machine learning model includes one of the following: a logistic regression model, a random forest model, or a gradient boosting decision tree model.
[0005] In some embodiments, historical satellite internal wave observation data includes historical internal wave linear vector data from satellites. Accordingly, based on the historical satellite internal wave observation data, an internal wave label dataset is generated, including: acquiring historical internal wave linear vector data, which includes at least one internal wave linear feature, each internal wave linear feature representing the position and morphology of a historical internal wave on the sea surface; transforming the initial coordinate system of the historical internal wave linear vector data to a predefined geographic coordinate system; merging at least one internal wave linear feature in the predefined geographic coordinate system to obtain merged linear features; using the merged linear features as the centerline, setting a buffer of a preset width to generate internal wave influence area surface data, the internal wave influence area surface data indicating the internal wave influence area surface; and determining the spatial relationship between multiple grid points corresponding to the historical marine environmental data and the internal wave influence area surface data to generate an internal wave label dataset, wherein grid points located within the internal wave influence area surface are marked as internal waves occurring, and grid points located outside the internal wave influence area surface are marked as internal waves not occurring.
[0006] In some embodiments, training a machine learning model based on historical data of feature variables and their corresponding inner wave labels includes: dimensional compression of the initial historical data of feature variables in the target region to obtain target historical data; standardization of the target historical data; construction of a training sample set with the standardized target historical data as input and the corresponding inner wave labels as output; and training on the training sample set using a logistic regression model and configuring its class weight parameters to automatically balance the ratio of positive and negative samples to obtain the inner wave prediction model.
[0007] In some embodiments, dimensional compression is performed on the initial historical data of the feature variables in the target area, including: determining the original data structure of the initial historical data, the original data structure including a depth dimension, a time dimension, a longitude dimension, and a latitude dimension representing the vertical layers of the ocean; determining target historical data representing the depth layer of the ocean surface and with a time value of a target date from the initial historical data based on the depth value indicated by the depth dimension and the time value indicated by the time dimension; and storing the target historical data in a target data structure, the target data structure including longitude, latitude, and a latitude dimension.
[0008] In some embodiments, training a logistic regression model on a training sample set includes: setting corresponding weight coefficients for the positive and negative categories in the loss function of the logistic regression model according to the ratio of the number of positive samples to negative samples in the training sample set, wherein the inner wave label corresponding to the positive sample indicates that an inner wave has occurred, and the inner wave label corresponding to the negative sample indicates that an inner wave has not occurred; and training the logistic regression model using the loss function with set weight coefficients.
[0009] In some embodiments, the method provided in this application further includes: using a test set to predict the internal wave prediction model and obtaining the prediction result of the internal wave prediction model on the test set; calculating the area under the ROC curve based on the prediction result and the real internal wave label corresponding to the test set; and evaluating the discriminative ability of the internal wave prediction model based on the area under the ROC curve.
[0010] In some embodiments, the method provided in this application further includes: obtaining forecast data of characteristic variables of the target area in the future time period; and calculating the probability field of occurrence of internal waves in the future time period using an internal wave prediction model based on the forecast data.
[0011] Secondly, embodiments of this application also provide an ocean internal wave forecasting device, comprising: an acquisition module for acquiring historical ocean environmental data at multiple spatiotemporal locations within a target area; a processing module for generating an internal wave label dataset that spatiotemporally matches the historical ocean environmental data acquired by the acquisition module, based on historical satellite internal wave observation data, through spatial buffer analysis and rasterization processing, wherein each internal wave label is used to indicate whether an internal wave exists at the corresponding spatiotemporal location; the processing module for determining feature variables for model training from the historical ocean environmental data acquired by the acquisition module, the feature variables including salinity and temperature; and the processing module for training a machine learning model based on the historical data of the feature variables and their corresponding internal wave labels to obtain an internal wave prediction model for predicting the probability of internal wave occurrence.
[0012] Thirdly, embodiments of this application also provide an electronic device, including: a processor and a memory communicatively connected to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the method in any possible implementation of the first aspect.
[0013] Fourthly, embodiments of this application also provide a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the method in any possible implementation of the first aspect described above.
[0014] The ocean internal wave forecasting method, apparatus, equipment, and storage medium provided in this application extract feature variables associated with internal wave occurrence from historical ocean environmental data and combine them with historical satellite internal wave observation data to generate an internal wave label dataset that spatiotemporally matches the historical ocean environmental data. Then, based on the historical data of these feature variables and the corresponding internal wave labels, a machine learning model is trained to obtain a predictive model for predicting the probability of internal wave occurrence. Compared with the complex calculations involving massive, multidimensional ocean physical fields in traditional numerical forecasting methods, this significantly reduces computational resource consumption and time costs, effectively improving the timeliness of internal wave forecasting. Attached Figure Description
[0015] Figure 1 A schematic diagram of an exemplary ocean internal wave prediction system provided in this application embodiment; Figure 2 A flowchart of an ocean internal wave prediction method provided in this application embodiment; Figure 3 Flowchart of another ocean internal wave prediction method provided in this application embodiment; Figure 4 A schematic diagram of the module structure of an ocean internal wave forecasting device provided in this application embodiment; Figure 5 This is a hardware structure diagram of an electronic device provided in an embodiment of this application. Detailed Implementation
[0016] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.
[0017] In the embodiments of this application, the terms "first" and "second" are used to distinguish identical or similar items with substantially the same function and effect. Those skilled in the art will understand that the terms "first" and "second" do not limit the quantity or execution order, nor do they necessarily imply difference. It should be noted that words such as "exemplary" or "for example" are used to indicate that something is being used as an example, illustration, or description. Any embodiment or design scheme described as "exemplary" or "for example" in this application should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of words such as "exemplary" or "for example" is intended to present the relevant concepts in a concrete manner. In the embodiments of this application, "at least one" refers to one or more, and "more than one" refers to two or more.
[0018] It should be noted that the phrase "at...time" in the embodiments of this application can refer to the instant at which a certain situation occurs, or to a period of time after the occurrence of a certain situation; the embodiments of this application do not specifically limit this. Furthermore, the ocean internal wave forecasting method provided in the embodiments of this application is merely an example, and the ocean internal wave forecasting method may include more or less content.
[0019] The exemplary embodiments of this application are described below with reference to the accompanying drawings. It should be understood that the exemplary embodiments described herein are for illustration and explanation only and are not intended to limit this application. Furthermore, the embodiments and features in the embodiments of this application can be combined with each other without conflict.
[0020] Ocean internal waves are wave phenomena occurring within the stable, stratified ocean interior, with frequencies between inertial and buoyancy frequencies. Their maximum amplitude occurs within the ocean interior. Internal waves can be generated when seawater density is stably stratified and a disturbance source is present. Based on vertical structure and propagation characteristics, ocean internal waves can be classified into linear internal waves, nonlinear isolated internal waves, and tidal internal waves. Ocean internal waves exhibit covert propagation characteristics and sudden bursts, impacting underwater communication and ship navigation safety. Therefore, ocean internal wave forecasting has become an important research topic in the field of ocean dynamics. Current ocean internal wave forecasting mainly relies on methods such as field observation, remote sensing identification, and numerical simulation. However, this approach suffers from insufficient data utilization, high computational costs, inadequate forecasting capabilities for sporadic events, and low operational integration, making it difficult to meet regional, real-time, and probabilistic operational forecasting needs.
[0021] In this embodiment, key environmental factors (including salinity and temperature) closely related to the internal wave generation mechanism are used as feature variables from historical marine environmental data. High-precision internal wave labels are constructed based on historical satellite observation data, and a machine learning model is trained to obtain an internal wave probability prediction model. This method, by using feature variables with clear physical meaning, makes the model highly interpretable and physically consistent with the internal wave generation mechanism (such as stratification changes). This reduces computational complexity, improves forecast timeliness, and ensures high reliability and operational applicability of the forecast results. The probability forecast results obtained using the forecasting method provided in this application can be used for risk warning and decision support in scenarios such as ship navigation and offshore platform operations.
[0022] In some embodiments, this application provides an internal ocean wave forecasting system. This system is a comprehensive platform integrating marine environmental monitoring, intelligent internal wave identification, probabilistic modeling, and visual forecasting. The system collects key marine environmental factors in real time through intelligent sensing terminal devices deployed in the ocean, combines multi-source data such as reanalysis data (e.g., CORA2.0 reanalysis data), and utilizes machine learning models to achieve automatic identification, key factor analysis, and dynamic forecasting of the occurrence probability of internal ocean waves, serving fields such as marine engineering, shipping safety, and environmental monitoring. The internal ocean wave forecasting system provided in this application has the following characteristics: Multi-source data fusion: Integrates on-site observations, reanalysis data, and satellite remote sensing data to improve data coverage and reliability. High forecast update frequency: Supports daily forecast updates to adapt to rapid changes in the marine environment. Strong interpretability: Identifies key driving factors (such as salinity stratification and current shear) through correlation analysis. Operational capability: This internal ocean wave forecasting system supports full-process automation from data access to product release.
[0023] refer to Figure 1 The ocean internal wave forecasting system 10 may include a terminal sensing layer 110, a data transmission and storage layer 120, a core algorithm engine layer 130, and an application service layer 140.
[0024] 1. The terminal sensing layer 110 may include various intelligent sensing terminal devices. For example, a profiling buoy array 111: equipped with a temperature, salinity, and depth sensor, used to acquire real-time data on temperature, salinity, and depth of different water layers. A current meter array 112: deployed in key sea areas to monitor zonal, meridional, and vertical current velocities. A satellite data receiving terminal 113: used to receive remote sensing data such as sea surface height anomalies and sea surface geostrophic current velocities. A meteorological sensor 114: used to measure meteorological elements such as sea surface pressure, wind speed, and wind direction. A communication module 115: transmits real-time data to a central server via satellite or mobile network.
[0025] 2. The data transmission and storage layer 120 may include an edge gateway 121: performing preliminary cleaning, time alignment, and format standardization on the raw data. A cloud server cluster 122: receiving and storing multi-source data. This multi-source data includes, for example, real-time sensor data, reanalysis grid data, and satellite identification results. A database system 123: using a time-series database to store sensor data and a spatial database to manage grid and vector data.
[0026] 3. The core algorithm engine layer 130 may include an internal wave feature extraction module 131, a probability prediction module 132, and a spatiotemporal analysis and visualization module 133. For example, the internal wave feature extraction module 131 is used to extract elements such as salinity, temperature, current velocity, and sea surface height anomalies (also referred to as factors in this embodiment) from multi-source data. Furthermore, it can also be used to perform correlation analysis to screen feature variables that are significantly related to the occurrence of internal waves, such as salinity and temperature.
[0027] The probability forecasting module 132 is used to generate labels for internal wave events based on satellite identification and to construct training samples. A logistic regression model combined with category weighting is used to train the probability forecasting model, outputting the internal wave occurrence probability field and supporting diurnal-scale updates.
[0028] The spatiotemporal analysis and visualization module 133 is used to generate internal wave frequency distribution maps and probability forecast heat maps. It also provides overlay analysis functions for key factor time series and internal wave events.
[0029] 4. The application service layer 140 includes a real-time probability forecasting platform 141 and an API data service 142. The real-time probability forecasting platform 141 provides users with a visual interactive interface to display various products generated by the core algorithm engine layer 130 (such as internal wave probability fields, historical frequency maps, and spatial distribution maps of key factors). This platform supports custom area queries, time-sliding forecasts, and threshold warning settings. The API data service 142 provides standardized data interfaces for third-party systems, and the data that can be accessed includes internal wave probability fields and environmental factor grid data calculated by the core algorithm engine layer.
[0030] As an example, the workflow of the aforementioned ocean internal wave forecasting system 10 can be as follows: Data aggregation: Intelligent sensing terminal devices upload data periodically (e.g., every 6 hours), and the system synchronously updates and reanalyzes the data with satellite identification results. Feature engineering: Extract multi-dimensional features such as temperature, salinity, current, and pressure of the target sea area, and perform standardization and gridding processing. Model prediction: Input the standardized features into a pre-trained probability forecasting model to generate the probability field of internal wave occurrence for the day. Result optimization: Spatially smooth the probability field to improve continuity. Visualization and publishing: Automatically generate probability forecast maps and mark high-risk areas (e.g., probability > 0.7). Update the internal wave occurrence frequency statistics map. Early warning push: When the probability of internal wave occurrence at a certain grid point exceeds a set threshold, send early warning information to subscribed users.
[0031] It is worth noting that the above-described ocean internal wave forecasting system 10 is an exemplary illustration. In practical applications, the ocean internal wave forecasting system 10 may include more or less software or hardware.
[0032] The following description of the ocean internal wave forecasting method provided by the exemplary embodiments of this application, in conjunction with the above-described ocean internal wave forecasting system and with reference to the accompanying drawings, should be noted. It should be noted that the above application scenarios are only shown for the purpose of understanding the spirit and principles of this application, and the embodiments of this application are not limited in any way in this respect.
[0033] like Figure 2 As shown in the embodiments of this application, the ocean internal wave prediction method includes the following steps.
[0034] 210. Obtain historical marine environmental data for multiple spatiotemporal locations within the target area.
[0035] In this embodiment, the target area can be the ocean area where internal wave forecasting needs to be performed. It is determined based on the actual application.
[0036] In some embodiments, multiple spatiotemporal locations include multiple observation stations spatially distributed within the target area, and each observation station contains multiple measurement points at different depths; the historical marine environmental data are high temporal resolution data related to internal waves acquired based on these stations and depth points, such as time-series data of temperature, salinity, and depth profiles and time-series data of current profiles. In other words, historical marine environmental data at multiple spatiotemporal locations refers to a set of data points within the target sea area that are both horizontally distributed and vertically hierarchical, and have a sufficiently high sampling frequency over time.
[0037] In some embodiments, the primary data sources used by the ocean internal wave forecasting system 10 include satellite data and historical reanalysis data. Historical marine environmental data from these data sources include sea surface temperature, sea surface salinity, radial velocity, zonal velocity, etc.
[0038] In some embodiments, step 210 can be performed by the terminal sensing layer 110 of the ocean internal wave forecasting system 10, for example, by acquiring historical ocean environmental data through various intelligent sensing terminal devices of the terminal sensing layer 110. Further, the acquired historical ocean environmental data can be stored by the data transmission and transmission layer 120.
[0039] 220. Based on historical satellite internal wave observation data, an internal wave label dataset is generated through spatial buffer analysis and rasterization processing, where each internal wave label is used to indicate whether an internal wave exists at the corresponding spatiotemporal location.
[0040] In some embodiments, historical satellite internal wave observation data may include historical internal wave linear vector data from satellites. For example, satellite remote sensing image data, primarily synthetic aperture radar (SAR) imagery. When significant internal waves propagate to the sea surface, the resulting convergent / divergent currents modulate sea surface roughness, thus forming alternating bright and dark stripes on radar images. This type of data is suitable for identifying the sea surface morphology, spatial wavelength, and propagation path of internal waves over a wide range. Understandably, historical internal wave linear vector data can accurately show whether internal waves occur at a specific spatiotemporal location, thereby generating high-precision internal wave labels.
[0041] In some embodiments, high-precision inner wave tags can be generated based on historical inner wave linear vector data from satellites by performing spatial buffer analysis and rasterization.
[0042] Specifically, historical internal wave linear vector data can be acquired. This data includes at least one internal wave linear feature, each representing the position and morphology of a historical internal wave at the sea surface. In some embodiments, the historical internal wave linear vector data can be identified and digitized from satellite remote sensing images by marine remote sensing experts or automated detection algorithms. For example, an arc-shaped internal wave stripe observed on a Terra / MODIS satellite image on June 15, 2019, would be digitized as a broken line composed of a series of latitude and longitude coordinates, along with attributes such as the observation date and satellite source, and stored in a Shapefile or GeoJSON format. This dataset may contain multiple internal wave lines observed during the same transit.
[0043] In some embodiments, satellite data from different sources or different processing procedures may use different map projections or coordinate systems. For example, the raw data may use the inherent projection of satellite imagery (such as UTM). To facilitate accurate spatial calculations with marine environmental grid data (typically using the WGS84 geographic coordinate system), the raw data needs to be converted to a unified predefined geographic coordinate system (such as EPSG:4326, i.e., WGS84). This is fundamental to ensuring the accuracy of subsequent spatial analysis. That is, the initial coordinate system of historical internal wave linear vector data can be converted to a predefined geographic coordinate system.
[0044] Furthermore, within this predefined geographic coordinate system, at least one internal wave linear feature is merged to obtain a merged linear feature. Using the merged linear feature as the centerline, a buffer zone of preset width is set to generate internal wave influence area surface data, which indicates the internal wave influence area surface. In a single observation, multiple discrete or discontinuous internal wave lines may be identified.
[0045] The process described above includes merging linear features and creating a buffer zone. Merging linear features connects all linear features belonging to the same event or analysis target into a single continuous geometric object. For example, several nearly parallel internal wave lines observed in the same sea area on the same day can be merged into a single polyline. The buffer zone is created because the internal wave lines observed by satellite are their "traces" on the sea surface, and their actual influence range is a strip-shaped area of a certain width. Using the merged line as the center line, a preset width (e.g., 6 kilometers, which is approximately 0.054 degrees in GIS based on the latitude of the studied sea area) is extended to both sides, generating a "noodle-like" polygonal surface. This surface represents the potential influence area of the internal wave phenomenon. For example, a 100-kilometer-long internal wave line, after a 6-kilometer buffer, will generate a polygonal influence area of approximately 1200 square kilometers.
[0046] Finally, the spatial relationship between multiple grid points corresponding to historical marine environmental data and the internal wave influence area data can be determined to generate an internal wave label dataset. Grid points located within the internal wave influence area are marked as indicating internal waves occurring, while grid points located outside the internal wave influence area are marked as indicating internal waves not occurring.
[0047] For example, historical marine environmental data (such as reanalysis data) is typically organized in a regular two-dimensional latitude and longitude grid. Each grid point represents the marine condition at a specific location (e.g., 113.5°E longitude, 20.1°N latitude). Using the spatial query function of a Geographic Information System (GIS), the geographical location of each marine environmental data grid point is determined one by one to be within the polygon of the "internal wave influence area surface" generated in the previous step. This is a precise geometric location determination process. If the grid point is within the surface, it is assumed that an internal wave occurred at the location represented by the grid point at the observation time, and it is assigned label 1 (positive sample). If the grid point is outside the surface, it is assumed that no internal wave occurred at that location, and it is assigned label 0 (negative sample). Finally, the labels of all grid points constitute a binary matrix that is completely spatially aligned with the marine environmental data grid, i.e., the internal wave label dataset.
[0048] In some embodiments, historical satellite internal wave observation data and historical marine environmental data originate from different sources. As mentioned above, historical satellite internal wave observation data may originate from satellites, referring to files that store information on the position and shape of internal waves interpreted from satellite images. Historical marine environmental data originates from reanalysis data.
[0049] In some embodiments, step 220 can be performed by the terminal sensing layer 110 of the ocean internal wave forecasting system 10, for example, by acquiring historical ocean environmental data through various intelligent sensing terminal devices of the terminal sensing layer 110. Further, the acquired historical ocean environmental data can be stored by the data transmission and transmission layer 120.
[0050] 230. Identify feature variables for model training from historical marine environmental data, including salinity and temperature.
[0051] In the embodiments of this application, determining the feature variables used for model training is a key step in achieving efficient and reliable internal wave probability prediction.
[0052] Specifically, this application abandons the traditional numerical weather prediction method's reliance on multi-dimensional, highly complex ocean physical fields (such as three-dimensional current velocity, density, and pressure fields), and instead starts from physical mechanisms to perform feature reduction and selection. The generation and propagation of internal ocean waves mainly depend on ocean density stratification (i.e., the variation of seawater density with depth) and background current fields. Among numerous marine environmental factors, seawater salinity and temperature are two relatively direct and core physical quantities for calculating seawater density. By using vertical profiles of salinity and temperature, the buoyancy frequency characterizing stratification stability can be accurately calculated, thereby determining whether internal waves occur. Therefore, this application uses salinity and temperature as feature variables for model training. This approach offers the following advantages.
[0053] The physical mechanism is clear: salinity and temperature are the direct influencing factors of the internal wave generation mechanism, ensuring that the forecast model is built on a solid physical foundation and that the forecast process maintains an inherent consistency with the actual internal wave generation mechanism.
[0054] Stable data acquisition: Salinity and temperature are fundamental constant variables in ocean observation and reanalysis. The high quality and continuous spatiotemporal coverage of the data ensure the reliability and availability of the model's input data.
[0055] Lightweight model: The feature dimensions are reduced from dozens or even hundreds in traditional methods to at least a few 8-core physical variables, greatly simplifying the model structure. This not only significantly reduces the computational overhead and memory usage during model training and prediction, resulting in a substantial reduction in computational costs, but also avoids the overfitting risk caused by high-dimensional data, improving the model's generalization ability.
[0056] Highly interpretable: Since the feature variables themselves have clear physical meaning, the model's learning process and final decisions (e.g., which temperature-salinity structure combinations correspond to the probability of high internal wave occurrence) can be intuitively understood and analyzed, meeting the needs of operational forecasting for physical verification and reliability assessment of forecast results.
[0057] 240. Based on historical data of feature variables and their corresponding internal wave labels, train a machine learning model to obtain an internal wave prediction model for predicting the probability of internal wave occurrence.
[0058] In some embodiments, the machine learning model includes one of the following: a logistic regression model, a random forest model, or a gradient boosting decision tree model.
[0059] The following section uses a logistic regression model as an example to further explain this step.
[0060] The model training process mainly includes: Step 1, preprocessing of feature variable data. Step 2, feature extraction and processing: standardizing key factors and constructing a training sample set with standardized key factors as features and internal wave labels as output. Step 3, model training: training using a logistic regression model and optimizing model parameters through maximum likelihood estimation. Step 4, model output: obtaining a predictive model that can output the probability of internal wave occurrence.
[0061] In some embodiments, before training the model, the feature variables are preprocessed as follows: the initial historical data of the feature variables in the target region are dimensionally compressed to obtain the target historical data, and the target historical data is standardized.
[0062] In some embodiments, the dimensionality compression process includes: determining the original data structure of the initial historical data, which includes depth, time, longitude, and latitude dimensions representing the vertical layers of the ocean; determining target historical data representing the depth layers of the ocean surface with time values of target dates from the initial historical data based on the depth values indicated by the depth dimension and the time values indicated by the time dimension; and storing the target historical data in a target data structure, which includes two dimensions: longitude and latitude, and latitude. Further, if the acquired initial historical data covers a large area, it needs to be cropped to the target region. Of course, if the historical data directly acquired is for the target region, the region cropping process is unnecessary.
[0063] In practice, salinity and temperature data are derived from ocean reanalysis datasets with matching spatiotemporal resolution. By performing dimensionality compression (extracting surface or specific layer data) and region clipping on the original data, two-dimensional feature grid data that is perfectly aligned with the internal wave label dataset in terms of spatiotemporal points is formed.
[0064] Furthermore, the salinity and temperature feature grid data after dimensionality compression are standardized (e.g., Z-score standardization) to eliminate dimensional differences, accelerate model convergence, and improve the stability of model training. The standardization values (mean and standard deviation) are calculated and stored from the training data and used for the same transformation on new data in subsequent prediction stages.
[0065] Furthermore, a training sample set can be constructed with standardized target historical data as input and corresponding internal wave labels as output. In some embodiments, standardized feature data (each grid point corresponds to a vector containing two feature values, salinity and temperature) can be paired with internal wave labels (0 or 1) that perfectly match their spatial locations to form a supervised learning sample set.
[0066] Finally, an internal wave prediction model can be obtained by using, for example, a logistic regression model and configuring its class weight parameters to automatically balance the ratio of positive to negative samples, and training it on the training sample set.
[0067] In some embodiments, to avoid model bias caused by sample imbalance, a class weighting technique is used during the training phase. For example, in the loss function of a logistic regression model, higher weights are assigned to the fewer "internal wave occurrence" samples (positive class), ensuring that the model equally values the learning of positive and negative samples. Specifically, based on the ratio of positive to negative samples in the training sample set, corresponding weight coefficients can be set for the positive and negative classes in the loss function of the logistic regression model. The internal wave label corresponding to a positive sample indicates that an internal wave has occurred, while the internal wave label corresponding to a negative sample indicates that an internal wave has not occurred. The logistic regression model is then trained using the loss function with the set weight coefficients.
[0068] In some embodiments, a reserved test set is used to evaluate the performance of the trained model. Specifically, the test set can be used to predict the internal wave prediction model, and the prediction results of the internal wave prediction model on the test set can be obtained; based on the prediction results and the corresponding real internal wave labels on the test set, the area under the ROC curve is calculated; the discrimination ability of the internal wave prediction model is evaluated based on the area under the ROC curve. In this evaluation process, the core evaluation metric is the area under the ROC curve, which can comprehensively evaluate the model's ability to distinguish between "occurred" and "non-occurred" internal waves at different discrimination thresholds.
[0069] The internal wave prediction model obtained through the above training process can receive new salinity and temperature grid data that have undergone the same preprocessing as input, and output the probability value of internal wave occurrence corresponding to each grid point in real time, thereby realizing fast and low-cost internal wave probability grid point forecasting, effectively meeting the urgent needs of forecast timeliness and availability in business applications.
[0070] In some embodiments, steps 230 to 240 can be executed by the core algorithm engine layer 130 of the ocean internal wave forecasting system 10. For example, the internal wave feature extraction module 131 is used to extract feature vectors such as salinity and temperature from multi-source data. The probability forecasting module 132 is used to generate labels for internal wave events identified by satellite and construct training samples. A probability forecasting model is trained using a logistic regression model combined with category weighting, outputting an internal wave occurrence probability field and supporting diurnal-scale updates. Furthermore, the spatiotemporal analysis and visualization module 133 can also generate visualization products such as internal wave frequency distribution maps and probability forecasting heat maps based on data such as the probability field.
[0071] In some embodiments, such as Figure 3 As shown in the embodiments of this application, the method further includes the following steps.
[0072] 310. Obtain forecast data of characteristic variables of the target area for future periods.
[0073] In some embodiments, the future time period can be a specific time interval or point in time for which internal wave prediction is required. In some embodiments, the primary input to the prediction phase is the forecast data of characteristic variables (i.e., salinity and temperature) within the target region for the future time period. This data can be sourced from operational ocean numerical weather prediction systems or short-term climate prediction systems. It is worth noting that the forecast data must be consistent with the training data in terms of spatial resolution, vertical hierarchy (typically surface or a specific standard layer), and data format, and must cover the same target region. After acquiring the forecast data, it must undergo a preprocessing procedure identical to that used in the training phase, including dimensionality compression (extracting surface data corresponding to the forecast time), region cropping to the target region, and standardization transformation using the normalizer saved during the training phase to ensure that the input data is consistent with the data distribution learned during model training.
[0074] 320. Based on the forecast data, use the internal wave prediction model to obtain the probability of internal waves occurring in the future period.
[0075] In some embodiments, a higher probability indicates a greater likelihood of an internal wave occurring at that location within a specified future time period. Furthermore, the prediction results can be output in the form of a probability field in a visualized manner. For example, the output probability field can be spatially smoothed to enhance continuity, and finally, geographic information can be overlaid to create an internal wave occurrence probability distribution map. Furthermore, it can be directly integrated with a Geographic Information System (GIS) to generate risk level zoning maps, or serve as a key input for intelligent ship navigation systems and offshore operation safety monitoring platforms, providing quantitative decision-making basis for route planning and operational window selection.
[0076] In some embodiments, steps 310 and 320 are performed by the ocean internal wave forecasting system 10. For example, the core algorithm engine layer 130 completes the probability forecast and the application service layer 140 visualizes the probability forecast results.
[0077] The training and application process of the prediction model is described below with reference to specific embodiments.
[0078] First, historical marine environmental data for the target period (e.g., August 2014) are acquired, and surface temperature and salinity within the study area are extracted as input features. Based on satellite identification results from the same period regarding the presence of internal waves, the internal wave line vector data are converted into spatial raster labels to generate a binary internal wave mask (1 indicates the presence of internal waves, and 0 indicates the absence of internal waves). The temperature and salinity feature data are then standardized.
[0079] Training samples are constructed daily for each spatial grid point within the study area. The input features are the temperature and salinity of that point, and the output label is whether internal waves occur at that point. To address the imbalance between positive and negative samples, a strategy combining random downsampling and class weighting is employed for sample balancing. A logistic regression model is used for training, with class balancing weights enabled during training. Model performance is evaluated by splitting the training and test sets (e.g., calculating the area under the curve). After training, the normalizer, model parameters, and grid configuration information are saved.
[0080] During the prediction phase, the daily temperature and salinity data for the target area undergo the same cropping and standardization process, and are then input into a trained logistic regression model to obtain the probability of internal wave occurrence at each grid point. The output probability field is spatially smoothed to enhance continuity, and finally, geographic information is overlaid to create a distribution map of the probability of internal wave occurrence.
[0081] In this embodiment, key environmental factors (including salinity and temperature) closely related to the internal wave generation mechanism are used as feature variables from historical marine environmental data. High-precision internal wave labels are constructed based on historical satellite observation data, and a machine learning model is trained to obtain an internal wave probability prediction model. This method, by using feature variables with clear physical meaning, makes the model highly interpretable and physically consistent with the internal wave generation mechanism (such as stratification changes). This reduces computational complexity, improves forecast timeliness, and ensures high reliability and operational applicability of the forecast results. The probability forecast results obtained using the forecasting method provided in this application can be used for risk warning and decision support in scenarios such as ship navigation and offshore platform operations.
[0082] Based on the same inventive concept, this application also provides a marine internal wave forecasting device. This marine internal wave forecasting device can, for example, be integrated into... Figure 1 The ocean internal wave forecasting system 10 is shown. (As shown in the image) Figure 4The diagram shows a schematic of the structure of an ocean internal wave forecasting device 400, which may include: an acquisition module 410 for acquiring historical ocean environmental data at multiple spatiotemporal locations within a target area; a processing module 420 for generating an internal wave label dataset that spatiotemporally matches the historical ocean environmental data acquired by the acquisition module 410, based on historical satellite internal wave observation data, through spatial buffer analysis and rasterization processing, wherein each internal wave label is used to indicate whether an internal wave exists at the corresponding spatiotemporal location; the processing module 420 for determining feature variables for model training from the historical ocean environmental data acquired by the acquisition module 410, the feature variables including salinity and temperature; and the processing module 420 for training a machine learning model based on the historical data of the determined feature variables and their corresponding internal wave labels to obtain an internal wave prediction model for predicting the probability of internal wave occurrence.
[0083] In some embodiments, the machine learning model includes one of the following: a logistic regression model, a random forest model, or a gradient boosting decision tree model.
[0084] In some embodiments, historical satellite internal wave observation data includes historical internal wave linear vector data from satellites. The processing module 420 is further configured to acquire historical internal wave linear vector data, which includes at least one internal wave linear feature, each internal wave linear feature representing the position and morphology of a historical internal wave on the sea surface; transform the initial coordinate system of the historical internal wave linear vector data to a predefined geographic coordinate system; merge at least one internal wave linear feature in the predefined geographic coordinate system to obtain a merged linear feature; use the merged linear feature as the centerline, set a buffer of a preset width, and generate internal wave influence area surface data, which indicates the internal wave influence area surface; determine the spatial relationship between multiple grid points corresponding to the historical marine environmental data and the internal wave influence area surface data to generate an internal wave label dataset, wherein grid points located within the internal wave influence area surface are marked as internal waves occurring, and grid points located outside the internal wave influence area surface are marked as internal waves not occurring.
[0085] In some embodiments, the processing module 420 is further configured to perform dimensionality compression on the initial historical data of the feature variables in the target region to obtain target historical data; perform standardization processing on the target historical data; construct a training sample set with the standardized target historical data as input and the corresponding internal wave label as output; and use a logistic regression model and configure its class weight parameters to automatically balance the ratio of positive and negative samples, and train it on the training sample set to obtain an internal wave prediction model.
[0086] In some embodiments, the processing module 420 is further configured to determine the original data structure of the initial historical data, the original data structure including a depth dimension, a time dimension, a longitude dimension, and a latitude dimension representing the vertical layers of the ocean; based on the depth value indicated by the depth dimension and the time value indicated by the time dimension, determine target historical data representing the depth layer of the ocean surface and the time value being a target date from the initial historical data; and save the target historical data in a target data structure, the target data structure including longitude, latitude, and latitude dimension.
[0087] In some embodiments, the processing module 420 is further configured to set corresponding weight coefficients for the positive and negative categories in the loss function of the logistic regression model according to the ratio of the number of positive samples to negative samples in the training sample set, wherein the inner wave label corresponding to the positive sample indicates that an inner wave has occurred, and the inner wave label corresponding to the negative sample indicates that an inner wave has not occurred; and train the logistic regression model using the loss function with set weight coefficients.
[0088] In some embodiments, the processing module 420 is further configured to use a test set to predict the internal wave prediction model, obtain the prediction result of the internal wave prediction model on the test set; calculate the area under the ROC curve based on the prediction result and the real internal wave label corresponding to the test set; and evaluate the discriminative ability of the internal wave prediction model based on the area under the ROC curve.
[0089] In some embodiments, the processing module 420 is further configured to acquire forecast data of characteristic variables of the target area in the future time period; and calculate the probability field of occurrence of internal waves in the future time period using an internal wave prediction model based on the forecast data.
[0090] In this embodiment, the ocean internal wave forecasting device uses key environmental factors (including salinity and temperature) closely related to the internal wave generation mechanism from historical marine environmental data as feature variables. It then constructs high-precision internal wave labels based on historical satellite observation data, trains a machine learning model, and obtains an internal wave probability prediction model. By using feature variables with clear physical meaning, the device ensures strong interpretability and physical consistency with the internal wave generation mechanism (such as stratification changes). This reduces computational complexity, improves forecast timeliness, and ensures high reliability and operational applicability of the forecast results. The probability forecast results obtained using the forecasting device provided in this application can be used for risk warning and decision support in scenarios such as ship navigation and offshore platform operations.
[0091] Based on the same inventive concept, embodiments of this application also provide an electronic device. This electronic device, for example, can be integrated into... Figure 1 The ocean internal wave forecasting system 10 is shown. Figure 5The illustrated electronic device 500 includes a processor 510 and a memory 520. The electronic device also includes a communication interface and a communication bus, wherein the processor 510, the memory 520, and the communication interface communicate with each other via the communication bus.
[0092] The memory 520 may include high-speed random access memory (RAM) and may also include non-volatile memory, such as at least one disk storage device. The communication bus can be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 5 The symbol is represented by a single double-headed arrow, but this does not mean that there is only one bus or one type of communication bus.
[0093] The communication interface is used to connect to at least one user terminal and other network units through the network interface, and to send encapsulated messages to the user terminal through the network interface.
[0094] The processor 510 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of the processor 510 or by instructions in software form. The processor 510 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this disclosure. The general-purpose processor may be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this disclosure can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in the memory 520. The processor 510 reads the information in the memory 520 and, in conjunction with its hardware, completes the steps of the method described in the foregoing embodiment.
[0095] This application also provides a computer storage medium storing computer-executable instructions. When executed by a processor, these instructions are used to implement the ocean internal wave forecasting method described in any of the preceding embodiments; therefore, they will not be repeated here. Furthermore, the beneficial effects of using the same method will also not be repeated. For technical details not disclosed in the computer storage medium embodiments of this invention, please refer to the description of the method embodiments of this invention.
[0096] This application also provides a computer program product, which includes a computer program. When the computer program is executed by a processor, it is used to implement the ocean internal wave prediction method described in any of the preceding embodiments. Therefore, it will not be described again here.
[0097] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0098] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.
Claims
1. A method for predicting internal ocean waves, characterized in that, include: Acquire historical marine environmental data at multiple spatiotemporal locations within the target area; Based on historical satellite internal wave observation data, an internal wave tag dataset that is spatiotemporally matched with the historical marine environmental data is generated through spatial buffer analysis and rasterization processing. Each internal wave tag is used to indicate whether an internal wave exists at the corresponding spatiotemporal location. Feature variables, including salinity and temperature, are determined from the historical marine environmental data for model training. Based on the historical data of the feature variables and their corresponding internal wave labels, a machine learning model is trained to obtain an internal wave prediction model for predicting the probability of internal wave occurrence.
2. The method according to claim 1, characterized in that, The historical satellite internal wave observation data includes historical internal wave linear vector data from satellites. Based on the historical satellite internal wave observation data, an internal wave label dataset that spatiotemporally matches the historical marine environmental data is generated through spatial buffer analysis and rasterization processing. This dataset includes: Historical internal wave linear vector data is acquired, which includes at least one internal wave linear element, and each internal wave linear element represents the position and shape of a historical internal wave on the sea surface. Transform the initial coordinate system of the historical intra-waveline vector data to a predefined geographic coordinate system; Under the predefined geographic coordinate system, the at least one inner wave linear feature is merged to obtain the merged linear feature; Using the merged linear features as the center line, a buffer zone of preset width is set to generate internal wave influence area surface data, which indicates the internal wave influence area surface. The spatial relationship between multiple grid points corresponding to the historical marine environmental data and the internal wave influence area data is determined to generate the internal wave label dataset. Grid points located within the internal wave influence area are marked as internal waves occurring, and grid points located outside the internal wave influence area are marked as internal waves not occurring.
3. The method according to claim 1, characterized in that, The training of the machine learning model based on the historical data of the feature variables and their corresponding internal wave labels includes: The initial historical data of the feature variables in the target region are subjected to dimensionality compression to obtain the target historical data. The target historical data is standardized. Construct a training sample set with standardized target historical data as input and corresponding internal wave labels as output; The internal wave prediction model is obtained by using a logistic regression model and configuring its class weight parameters to automatically balance the ratio of positive to negative samples on the training sample set.
4. The method according to claim 3, characterized in that, The dimensionality compression of the initial historical data of the feature variables in the target region includes: The original data structure of the initial historical data is determined, and the original data structure includes depth dimension, time dimension, longitude dimension and latitude dimension representing the vertical layers of the ocean; Based on the depth value indicated by the depth dimension and the time value indicated by the time dimension, the target historical data representing the depth level of the ocean surface and with a time value of the target date are determined from the initial historical data; The target historical data is stored in a target data structure, which includes longitude, latitude, and latitude dimensions.
5. The method according to claim 3, characterized in that, The training of the logistic regression model on the training sample set includes: Based on the ratio of positive to negative samples in the training sample set, corresponding weight coefficients are set for the positive and negative categories in the loss function of the logistic regression model, wherein the inner wave label corresponding to the positive sample indicates that an inner wave has occurred, and the inner wave label corresponding to the negative sample indicates that no inner wave has occurred. The logistic regression model is trained using a loss function with the weights set.
6. The method according to claim 3, characterized in that, Also includes: The internal wave prediction model is used to make predictions on the test set, and the prediction results of the internal wave prediction model on the test set are obtained. Based on the prediction results and the actual internal wave labels corresponding to the test set, the area under the ROC curve is calculated. The discriminative ability of the internal wave prediction model is evaluated based on the area under the ROC curve.
7. The method according to any one of claims 1-6, characterized in that, Also includes: Obtain forecast data of the characteristic variables of the target region for future time periods; Based on the forecast data, the probability field of internal wave occurrence in the future time period is calculated using the internal wave prediction model.
8. A marine internal wave prediction device, characterized in that, include: The acquisition module is used to acquire historical marine environmental data from multiple spatiotemporal locations within the target area; The processing module is used to generate an internal wave label dataset that is spatiotemporally matched with the historical marine environment data acquired by the acquisition module, based on historical satellite internal wave observation data and through spatial buffer analysis and rasterization processing. Each internal wave label is used to indicate whether an internal wave exists at the corresponding spatiotemporal location. The processing module is used to determine feature variables for model training from the historical marine environmental data acquired by the acquisition module, the feature variables including salinity and temperature; The processing module is used to train a machine learning model based on the historical data of the feature variables and their corresponding internal wave labels to obtain an internal wave prediction model for predicting the probability of internal wave occurrence.
9. An electronic device comprising a memory, a processor, and computer instructions stored in the memory and executable on the processor, characterized in that, When the processor executes the computer instructions, it implements the method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any one of claims 1 to 7.