A self-labeling method for measurement error data of a multi-parameter temperature-salinity-depth meter
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 青岛道万科技有限公司
- Filing Date
- 2025-05-09
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies struggle to accurately identify errors in multi-parameter temperature, salinity, and depth (TDT) measurement data in complex marine environments, especially errors caused by abnormal coupling relationships between multiple parameters, resulting in insufficient efficiency and accuracy in marine observation data quality control.
By establishing the function matrix of the acquisition curves for temperature, salinity, and depth parameters, constructing the parameter correlation network graph, extracting key correlation paths using the minimum spanning tree algorithm, establishing a set of multi-parameter relationship coupling equations, and combining it with a neural network model based on a multi-head sparse attention mechanism, high-precision automatic labeling of temperature, salinity, and depth data errors can be achieved.
The system improves the accuracy of automatic identification and labeling of errors in multi-parameter temperature, salinity, and depth (TDT) measurement data, enhances the system's sensitivity to marginal outliers, and has adaptive capabilities to adapt to changes in data characteristics in different sea areas and seasons, significantly improving the reliability and accuracy of marine observation data.
Smart Images

Figure CN120467413B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of electronic digital data processing technology, and specifically relates to a method for self-labeling measurement error data of a multi-parameter temperature, salinity, and depth meter. Background Technology
[0002] In the field of marine observation, CTD (Conductivity, Temperature, and Depth) meters are crucial instruments for acquiring key parameters such as seawater temperature, salinity, and depth. Traditional CTD data processing techniques primarily rely on single-parameter threshold determination or simple statistical models for error identification. They typically employ methods such as historical data statistical analysis, interval determination, and analysis of variance to screen for outliers in the collected data. These methods can effectively identify measurement errors that significantly deviate from normal values within a single parameter range.
[0003] However, traditional techniques have significant drawbacks. First, single-parameter determination methods struggle to handle complex coupling relationships between parameters, neglecting the physical correlation between temperature, salinity, and depth. Second, simple statistical models are insufficiently sensitive to marginal outliers, especially in variable marine environments, easily misjudging normal fluctuations as anomalies or mistaking abnormal data for normal ones. Furthermore, fixed threshold determinations lack adaptability and struggle to cope with changes in data characteristics across different sea areas and seasons.
[0004] As the requirements for accuracy in ocean observation continue to increase, existing technologies struggle to address the error identification problem in multi-parameter CTD (Conductivity, Temperature, and Depth) instruments operating in complex marine environments. In particular, they struggle to accurately identify measurement errors caused by abnormal coupling relationships between multiple parameters, hindering the efficiency and accuracy of ocean observation data quality control. Therefore, there is an urgent need to develop an error identification method that can comprehensively consider the physical correlations between parameters. In other words, existing technologies suffer from insufficient accuracy in automatically identifying and marking errors in multi-parameter CTD measurement data. Summary of the Invention
[0005] In view of this, the present invention provides a self-marking method for measurement error data of a multi-parameter temperature, salinity and depth instrument, which can solve the technical problems of insufficient accuracy in automatic identification and marking of measurement data errors in existing multi-parameter temperature, salinity and depth instruments.
[0006] This invention is implemented as follows: It provides a self-labeling method for measurement error data of a multi-parameter temperature, salinity, and depth meter, comprising: establishing function matrices for temperature parameter acquisition curves, salinity parameter acquisition curves, and depth parameter acquisition curves; segmenting newly sampled data according to time windows to form temperature parameter segment matrices, salinity parameter segment matrices, and depth parameter segment matrices; calculating the deviations between the temperature parameter segment matrices and the temperature parameter acquisition curve function matrices, the salinity parameter segment matrices and the salinity parameter acquisition curve function matrices, and the depth parameter segment matrices and the depth parameter acquisition curve function matrices, respectively; constructing a parameter correlation network diagram among temperature, salinity, and depth parameters; extracting key parameter correlation paths using the minimum spanning tree algorithm; establishing a multi-parameter relationship coupling equation set; calculating theoretical coupling values and actual coupling values using the multi-parameter relationship coupling equation set to obtain a coupling deviation matrix; inputting the deviation values and the coupling deviation matrix into a temperature, salinity, and depth error labeling neural network model to calculate error judgment values; comparing the error judgment values with preset thresholds to label the error data.
[0007] Among them, the parameter acquisition curve function matrix is a matrix composed of polynomial functions obtained by fitting historical measurement data using the least squares method, which is used to describe the regularity of parameter changes over time or space.
[0008] The time window refers to dividing continuously collected data into multiple data segments according to fixed time intervals. Each data segment contains several continuous sampling points for local analysis of parameter variation characteristics.
[0009] The deviation value refers to the difference between the measured data and the predicted value of the theoretical model, which is quantified by calculating the Euclidean distance or Manhattan distance.
[0010] The multi-parameter coupled equation set includes temperature-salinity relationship equation, temperature-depth relationship equation, salinity-depth relationship equation, and comprehensive temperature-salinity-depth relationship equation.
[0011] The temperature-salinity relationship equation is used to describe the physical correlation between temperature and salinity. The inputs include a temperature parameter segment matrix, a salinity parameter segment matrix, a historical temperature-salinity correlation database, the rate of change of ambient temperature, and seawater density parameters. The output is a temperature-salinity theoretical coupling value matrix.
[0012] The temperature-depth relationship equation is used to describe the physical correlation between temperature and depth. The inputs include temperature parameter segment matrix, depth parameter segment matrix, temperature-depth historical correlation database, depth gradient coefficient and water pressure change rate. The output is the temperature-depth theoretical coupling value matrix.
[0013] The salinity-depth relationship equation describes the physical correlation between salinity and depth. The inputs include a salinity parameter segment matrix, a depth parameter segment matrix, a historical salinity-depth correlation database, a depth stratification coefficient, and an ocean current influence factor. The output is a salinity-depth theoretical coupling value matrix.
[0014] Among them, the temperature-salinity-depth integrated relational equation is used to integrate the comprehensive physical correlation between the three parameters. The input includes temperature parameter segment matrix, salinity parameter segment matrix, depth parameter segment matrix, historical temperature-salinity-depth correlation database and seawater state equation parameters. The output is the three-parameter integrated theoretical coupling value matrix.
[0015] The theoretical coupling values include the temperature-salinity theoretical coupling value matrix, the temperature-depth theoretical coupling value matrix, the salinity-depth theoretical coupling value matrix, and the three-parameter comprehensive theoretical coupling value matrix; the actual coupling values refer to the actual mutual influence values between parameters directly calculated from actual measurement data, including the temperature-salinity actual coupling value matrix, the temperature-depth actual coupling value matrix, the salinity-depth actual coupling value matrix, and the three-parameter comprehensive actual coupling value matrix.
[0016] Compared with existing technologies, this invention provides a self-labeling method for measurement error data of a multi-parameter temperature, salinity, and depth instrument. This invention proposes a self-labeling method for measurement error data of a multi-parameter temperature, salinity, and depth instrument by establishing a parameter acquisition curve function matrix, constructing a parameter correlation network diagram, extracting key parameter correlation paths, establishing a multi-parameter relationship coupling equation set, and combining a neural network model, thereby achieving high-precision automatic labeling of temperature, salinity, and depth data errors.
[0017] This method overcomes the shortcomings of traditional techniques by introducing a set of multi-parameter relationship coupling equations, which fully considers the physical correlation between temperature, salinity, and depth, so that error identification is no longer limited to threshold judgment of a single parameter. By adopting a hybrid expert model based on a multi-head sparse attention mechanism to replace the traditional error judgment function, the sensitivity of the system to marginal outliers is improved. Through a pre-trained neural network model, the adaptive capability of the method is enhanced, which can automatically adjust the error judgment criteria according to the data characteristics of different sea areas and seasons.
[0018] This invention successfully solves the technical problems of insufficient accuracy in automatic identification and marking of measurement data from multi-parameter temperature, salinity, and depth instruments, significantly improving the reliability and accuracy of marine observation data. It provides a more reliable data foundation for marine scientific research and marine environmental monitoring, and has important application value. Attached Figure Description
[0019] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation
[0020] like Figure 1The diagram shown is a flowchart of a self-labeling method for measurement error data of a multi-parameter temperature, salinity, and depth instrument provided by this invention. This method includes the following steps:
[0021] S01. Establish the function matrix for temperature parameter acquisition curve, salinity parameter acquisition curve, and depth parameter acquisition curve;
[0022] S02. The newly sampled data is segmented according to the time window to form temperature parameter segment matrix, salinity parameter segment matrix and depth parameter segment matrix;
[0023] S03. Calculate the deviation values between the temperature parameter segment matrix and the temperature parameter acquisition curve function matrix, the deviation values between the salinity parameter segment matrix and the salinity parameter acquisition curve function matrix, and the deviation values between the depth parameter segment matrix and the depth parameter acquisition curve function matrix, respectively.
[0024] S04. Construct a parameter relationship network graph between temperature, salinity and depth parameters. Treat the three parameters as nodes in the graph and the relationship strength between parameters as edge weights. Use the minimum spanning tree algorithm to extract key parameter relationship paths and establish a set of multi-parameter relationship coupling equations between temperature, salinity and depth parameters.
[0025] S05. Calculate the theoretical coupling value between the temperature parameter segment matrix, salinity parameter segment matrix, and depth parameter segment matrix in the newly sampled data using a set of multi-parameter relationship coupling equations.
[0026] S06. Calculate the actual coupling value between the temperature parameter segment matrix, the salinity parameter segment matrix, and the depth parameter segment matrix in the newly sampled data;
[0027] S07. Calculate the coupling deviation matrix between the theoretical coupling value and the actual coupling value;
[0028] S08. Replace the traditional error determination function with a pre-trained temperature, salinity and depth error labeling neural network model. Input the deviation value obtained in step S03 and the coupling deviation matrix obtained in step S07 into the temperature, salinity and depth error labeling neural network model to calculate the error determination value.
[0029] S09. Compare the error judgment value with the preset threshold. When the error judgment value is greater than the preset threshold, mark the corresponding data point as an error.
[0030] Among them, the parameter acquisition curve function matrix is a matrix composed of polynomial functions obtained by fitting historical measurement data using the least squares method, which is used to describe the regularity of parameter changes over time or space.
[0031] The time window refers to dividing continuously collected data into multiple data segments according to fixed time intervals. Each data segment contains several continuous sampling points for local analysis of parameter variation characteristics.
[0032] The deviation value refers to the difference between the measured data and the predicted value of the theoretical model, which is quantified by calculating the Euclidean distance or Manhattan distance.
[0033] Among them, the multi-parameter coupled equation set includes temperature-salinity relationship equation, temperature-depth relationship equation, salinity-depth relationship equation and comprehensive temperature-salinity-depth relationship equation;
[0034] The temperature-salinity relationship equation is used to describe the physical correlation between temperature and salinity. The inputs include a temperature parameter segment matrix, a salinity parameter segment matrix, a temperature-salinity historical correlation database, the rate of change of ambient temperature and seawater density parameters, and the output is a temperature-salinity theoretical coupling value matrix.
[0035] The temperature-depth relationship equation is used to describe the physical correlation between temperature and depth. The inputs include temperature parameter segment matrix, depth parameter segment matrix, temperature-depth historical correlation database, depth gradient coefficient and water pressure change rate. The output is temperature-depth theoretical coupling value matrix.
[0036] The salinity-depth relationship equation is used to describe the physical correlation between salinity and depth. The inputs include a salinity parameter segment matrix, a depth parameter segment matrix, a historical salinity-depth correlation database, a depth stratification coefficient, and an ocean current influence factor. The output is a salinity-depth theoretical coupling value matrix.
[0037] The temperature, salinity, and depth integrated relational equation is used to integrate the comprehensive physical correlation between the three parameters. The inputs include temperature parameter segment matrix, salinity parameter segment matrix, depth parameter segment matrix, historical temperature, salinity, and depth correlation database, and seawater state equation parameters. The output is a three-parameter integrated theoretical coupling value matrix.
[0038] Among them, the theoretical coupling value is the output result of the multi-parameter relationship coupling equation set calculated in step S05, including the temperature-salinity theoretical coupling value matrix, the temperature-depth theoretical coupling value matrix, the salinity-depth theoretical coupling value matrix, and the three-parameter comprehensive theoretical coupling value matrix.
[0039] Among them, the actual coupling value refers to the actual mutual influence value between parameters directly calculated from the actual measurement data, including the actual coupling value matrix of temperature and salinity, the actual coupling value matrix of temperature and depth, the actual coupling value matrix of salinity and depth, and the three-parameter comprehensive actual coupling value matrix.
[0040] The coupling deviation matrix is a numerical matrix that represents the degree of difference between the theoretical coupling relationship and the actual coupling relationship. The larger the element value, the higher the probability of data anomalies.
[0041] The error determination function is used to comprehensively evaluate the degree of anomaly of the data points. The inputs include temperature deviation value, salinity deviation value, depth deviation value, coupling deviation matrix, historical error distribution feature vector, and environmental interference factor. The output is the error determination value.
[0042] Error marking involves adding specific identifiers to data records. These identifiers can be obtained by using general methods such as linking markers or adding text, or by storing the addresses of all data records with errors in a dataset. Other general methods can also be used to indicate that the data point is abnormal or unreliable, facilitating identification and processing during subsequent data analysis.
[0043] The specific structure of the temperature-salinity-depth error labeling neural network model is a hybrid expert model based on a multi-head sparse attention mechanism. It includes an input layer, a multi-head sparse attention layer, a parameter encoding layer, a multi-layer perceptron layer, and an output layer. The multi-head sparse attention layer is used to handle the relationship between different parameters. The parameter encoding layer maps the original data to a high-dimensional feature space. The multi-layer perceptron layer performs feature fusion and nonlinear mapping. The output layer generates the final error judgment value.
[0044] The steps for establishing the training dataset in the pre-training process of the temperature, salinity, and depth error labeling neural network model specifically include collecting a large amount of historical temperature, salinity, and depth measurement data and manually labeled error samples, cleaning and normalizing the collected data, segmenting the data according to the time window length and extracting features, calculating the statistical correlation between each feature and the manually labeled error, selecting feature combinations with strong correlation to form training sample pairs, and dividing the sample set into training set, validation set and test set in a ratio of 8:1:1.
[0045] The pre-training steps of the temperature, salinity, and depth error-labeled neural network model specifically include: initializing network parameters and setting the learning rate, batch size, and number of training rounds; using training set data for forward propagation to calculate the loss; using an adaptive moment estimation optimization algorithm for backpropagation to update parameters; periodically evaluating model performance and adjusting hyperparameters using the validation set; focusing on learning different error types of samples to address the sample imbalance problem; terminating training early when the validation set performance no longer improves to avoid overfitting; and using the test set for final performance evaluation and saving the trained model parameters.
[0046] In this multi-head sparse attention mechanism, the number of heads matches the number of time windows in step S02, the sparsity of the sparse attention matrix is adapted to the sparsity characteristics of the coupling bias matrix calculated in step S07, and the hidden layer dimension of the attention layer is equal to the sum of the dimensions of the three bias values calculated in step S03.
[0047] The specific implementation of the above steps is described in detail below. Step S01 involves constructing a function matrix of temperature, salinity, and depth parameter acquisition curves using historical measurement data. First, a large amount of historical temperature, salinity, and depth measurement data is collected and sorted according to time series or spatial distribution. For each parameter's historical data, a least-squares method is used to perform polynomial fitting. The order of the fitted polynomial is selected based on the data complexity, generally between 3 and 5. For the temperature parameter, a polynomial function characterizing temperature changes with time or depth is fitted; for the salinity parameter, a polynomial function characterizing salinity changes with time or depth is fitted; for the depth parameter, a polynomial function characterizing depth changes with time is fitted. All fitted functions are organized into a matrix, with each row corresponding to a time point or depth point and each column corresponding to a polynomial coefficient. The purpose of this step is to establish a benchmark model of parameter changes, providing a theoretical basis for subsequent deviation calculations. By fitting the regular characteristics of historical data, abnormal change patterns in new data can be effectively identified.
[0048] The specific implementation of step S02 involves segmenting the newly sampled data into time windows. First, an appropriate time window length is determined, typically chosen based on the data sampling frequency and the changing characteristics of the marine environment, with a typical value of 5-15 minutes. Continuously collected temperature, salinity, and depth data are divided into several data segments according to the set time windows. For temperature data, a temperature parameter segment matrix is formed, where each row represents a time window and each column represents a sampling point within that window; for salinity data, a salinity parameter segment matrix is formed; and for depth data, a depth parameter segment matrix is formed. Statistical features are extracted from each data segment, including mean, variance, slope, and kurtosis, which are used as the feature vector for that segment. The purpose of data segmentation is to decompose long-term series into shorter time periods, facilitating local analysis of parameter variation characteristics and improving the temporal resolution and sensitivity of anomaly detection.
[0049] The specific implementation of step S03 involves calculating the deviation between the parameter segment matrix and the parameter acquisition curve function matrix. For each temperature parameter segment, the Euclidean distance or Manhattan distance is calculated between it and the predicted value of the temperature parameter acquisition curve function within the corresponding time interval to obtain the temperature deviation value. During the calculation, the difference between the actual temperature data points and the theoretical model prediction values within each time window is first calculated, and then the distance is calculated. For the Euclidean distance, the calculation formula is based on the square root of the sum of squares of the deviations at each point; for the Manhattan distance, it is based on the sum of the absolute values of the deviations at each point. Similarly, the deviation values between the salinity parameter segment and the salinity parameter acquisition curve function, and the deviation values between the depth parameter segment and the depth parameter acquisition curve function are calculated. The three types of deviation values are organized into a deviation value matrix for subsequent anomaly detection. The purpose of calculating the deviation value is to quantify the degree of difference between the newly sampled data and historical patterns, providing a basic basis for error identification. The larger the deviation value, the further the data deviates from historical patterns, and the higher the probability of an anomaly.
[0050] The specific implementation of step S04 involves constructing a parameter correlation network graph and establishing a set of multi-parameter relationship coupling equations. First, temperature, salinity, and depth are used as nodes in the network graph. Based on historical data, mutual information or correlation coefficients between parameters are calculated as edge weights. For example, the Pearson correlation coefficient or mutual information value between temperature and salinity, temperature and depth, and salinity and depth is calculated; a larger weight indicates a stronger correlation between parameters. Using a minimum spanning tree algorithm such as Prim's algorithm or Kruskal's algorithm, key parameter correlation paths are extracted from the parameter correlation network graph, retaining the edges with the strongest correlations. Based on the extracted key correlation paths, a set of multi-parameter relationship coupling equations is established, including equations for temperature-salinity relationship, temperature-depth relationship, salinity-depth relationship, and a comprehensive temperature-salinity-depth relationship. The coupling equations can be established using multiple regression analysis, neural networks, or physical oceanographic models. The purpose of constructing the parameter correlation network is to uncover the inherent physical correlations between parameters. The minimum spanning tree algorithm helps reduce complexity and retain the most important correlations, laying the foundation for subsequent coupling deviation analysis.
[0051] The specific implementation of step S05 involves calculating theoretical coupling values using a set of multi-parameter relationship coupling equations. For the temperature-salinity relationship equation, the inputs include a temperature parameter segment matrix, a salinity parameter segment matrix, a historical temperature-salinity correlation database, the rate of change of ambient temperature, and seawater density parameters. The temperature-salinity theoretical coupling value matrix is then calculated using the equations. For the temperature-depth relationship equation, the inputs include a temperature parameter segment matrix, a depth parameter segment matrix, a historical temperature-depth correlation database, the depth gradient coefficient, and the rate of change of water pressure. The temperature-depth theoretical coupling value matrix is then calculated. For the salinity-depth relationship equation, the inputs include a salinity parameter segment matrix, a depth parameter segment matrix, a historical salinity-depth correlation database, the depth stratification coefficient, and ocean current influence factors. The salinity-depth theoretical coupling value matrix is then calculated. For the comprehensive temperature-salinity-depth relationship equation, the inputs include three parameter segment matrices, a historical temperature-salinity-depth correlation database, and seawater state equation parameters. The comprehensive three-parameter theoretical coupling value matrix is then calculated. The purpose of calculating the theoretical coupling values is to predict the ideal correlation state between parameters based on historical data and physical models, providing a benchmark for comparison with actual coupling values.
[0052] The specific implementation of step S06 involves calculating the actual coupling values between parameter segment matrices in the newly sampled data. For temperature and salinity parameters, the covariance or cross-correlation coefficient of temperature and salinity within each time window is directly calculated to form a temperature-salinity actual coupling value matrix. For temperature and depth parameters, the covariance or cross-correlation coefficient of temperature and depth within each time window is calculated to form a temperature-depth actual coupling value matrix. For salinity and depth parameters, the covariance or cross-correlation coefficient of salinity and depth within each time window is calculated to form a salinity-depth actual coupling value matrix. For the comprehensive relationship of the three parameters, multivariate statistical methods such as principal component analysis or normative correlation analysis are used to calculate the comprehensive correlation strength of the three parameters, forming a comprehensive three-parameter actual coupling value matrix. The purpose of calculating the actual coupling values is to directly extract the interrelationships between parameters from the actual measurement data, providing a practical basis for comparison with theoretical coupling values.
[0053] The specific implementation of step S07 involves calculating the coupling deviation matrix between the theoretical and actual coupling values. The temperature-salinity coupling deviation matrix is obtained by subtracting the theoretical and actual temperature-salinity coupling value matrices. Similarly, the temperature-depth coupling deviation matrix is obtained by subtracting the theoretical and actual temperature-depth coupling value matrices. The salinity coupling deviation matrix is obtained by subtracting the theoretical and actual salinity coupling value matrices. Finally, the three-parameter comprehensive coupling deviation matrix is obtained by subtracting the combined theoretical and actual three-parameter coupling value matrix. All coupling deviation matrices are then normalized for easier subsequent comprehensive judgment. The purpose of calculating the coupling deviation is to quantify the degree of difference between the actual parameter relationships and theoretical expectations; a larger deviation indicates a higher probability of data anomalies. Coupling deviation analysis can capture abnormal patterns that might be overlooked by single-parameter deviation analysis.
[0054] The specific implementation of step S08 involves calculating the error determination value using a pre-trained temperature, salinity, and depth error labeling neural network model. The temperature deviation value, salinity deviation value, and depth deviation value obtained in step S03, along with the coupling deviation matrices obtained in step S07, are combined to form a feature vector. This feature vector is then input into the pre-trained temperature, salinity, and depth error labeling neural network model. The neural network model is based on a hybrid expert structure using a multi-head sparse attention mechanism. It first receives the feature vector through the input layer, then processes the relationships between different parameters in the multi-head sparse attention layer, with the number of heads matching the number of time windows. The output of the attention layer is mapped to a high-dimensional feature space through a parameter encoding layer, and then undergoes feature fusion and nonlinear mapping through a multi-layer perceptron layer. Finally, the output layer generates an error determination value, ranging from 0 to 1, with values closer to 1 indicating a higher probability of data anomalies. The advantage of using a neural network model instead of a traditional error determination function lies in its ability to adaptively learn complex data anomaly patterns, improving the accuracy and robustness of error identification.
[0055] The specific implementation of step S09 involves comparing the error judgment value with a preset threshold and marking the error. First, an appropriate error judgment threshold is set, typically balancing application requirements and acceptable false positive / false negative rates, with a typical value of 0.7-0.85. The error judgment value calculated in step S08 is compared with the preset threshold. When the error judgment value for a certain time window or data point exceeds the preset threshold, a specific identifier is added to the corresponding data record, such as setting a flag bit to 1 or adding a special marker symbol. Error marking can be divided into different levels, such as minor anomalies (0.7-0.85), moderate anomalies (0.85-0.95), and severe anomalies (0.95-1.0). The purpose of error marking is to provide a reference for subsequent data processing, facilitating data filtering and quality control by data analysts or automated systems. Marked data can be processed by methods such as removal, correction, or weight reduction according to application requirements.
[0056] Furthermore, the specific structural implementation of the temperature-salinity-depth error labeling neural network model is as follows: The model adopts a hybrid expert model structure based on a multi-head sparse attention mechanism. The input layer receives feature vectors, including three parameter bias values and elements of each coupling bias matrix. The multi-head sparse attention layer consists of multiple parallel attention sub-layers, each focusing on different feature combinations, with the number of heads matched to the number of time windows. The attention mechanism determines feature importance by calculating the similarity between the query vector, key vector, and value vector, and introduces sparsity constraints to reduce computational complexity. The parameter encoding layer uses a fully connected network structure to map the original features to a high-dimensional latent space, enhancing feature representation capabilities. The multilayer perceptron layer consists of multiple fully connected layers, each followed by batch normalization and ReLU activation functions, responsible for feature fusion and nonlinear mapping. The output layer is a single neuron, using a sigmoid activation function to output an error judgment value in the range of 0-1. The model also includes multiple expert networks and a gating network, which dynamically selects the most suitable expert network for prediction based on the input features. The network training uses the backpropagation algorithm, the loss function uses either cross-entropy loss or binary focus loss, and the optimizer uses the Adam algorithm.
[0057] Furthermore, the specific implementation method for establishing the training dataset of the temperature, salinity, and depth (TDM) error-labeled neural network model is as follows: First, collect no less than 10,000 sets of historical TDM measurement data and manually labeled error samples to ensure coverage of different sea areas, seasons, and environmental conditions. Clean the collected data, removing obviously unreasonable values, such as temperature, salinity, or depth records exceeding the physically possible range. Normalize the data using min-max normalization or Z-score normalization methods to scale the parameter values to a uniform range. Segment the continuous data according to the set time window length, extracting statistical features such as mean, standard deviation, slope, kurtosis, and frequency domain features such as power spectral density for each segment. Calculate the mutual information or correlation coefficient between each extracted feature and the manually labeled error, selecting feature combinations with mutual information values or correlation coefficients exceeding 0.3 to form an effective feature set. Randomly divide the samples into training, validation, and test sets in an 8:1:1 ratio, ensuring that the proportion of abnormal samples to normal samples is consistent in each set. To address the imbalanced sample problem, a combination of oversampling and undersampling techniques or adjustments to sample weights are employed to ensure the model has equal recognition capabilities for all types of errors. Data augmentation techniques, such as adding random noise and sliding time windows, are used to expand the training samples, thereby improving the model's generalization and robustness.
[0058] The mathematical model or calculation process involved in this invention will be described in detail below.
[0059] The establishment of the temperature parameter acquisition curve function matrix, salinity parameter acquisition curve function matrix, and depth parameter acquisition curve function matrix in step S01 involves polynomial fitting calculations, as specifically shown below:
[0060] The function matrix of the temperature parameter acquisition curve can be expressed as follows using polynomial fitting:
[0061] ;
[0062] In the formula, The fitted temperature function; It is a time variable; These are the temperature polynomial coefficients; The order is the polynomial, typically taken as 3-5; This represents the temperature fitting error term.
[0063] The function matrix of the salinity parameter acquisition curve can be expressed as follows using polynomial fitting:
[0064] ;
[0065] In the formula, The fitted salinity function; It is a time variable; These are the coefficients of the salinity polynomial; The order is the polynomial, typically taken as 3-5; This represents the salinity fitting error term.
[0066] For the depth parameter acquisition curve function matrix, polynomial fitting can be used to express it as:
[0067] ;
[0068] In the formula, The depth function is the fitted value. It is a time variable; These are the coefficients of the depth polynomial; The order is the polynomial, typically taken as 3-5; This represents the depth fitting error term.
[0069] Temperature parameter acquisition curve function matrix It can be represented as:
[0070] ;
[0071] In the formula, Indicates the first Temperature polynomial coefficients for a time period or depth segment The value; The number of time periods or depth segments; Let be the order of the polynomial.
[0072] Similarly, the function matrix of the salinity parameter acquisition curve. and depth parameter acquisition curve function matrix They can be represented as:
[0073] ; ;
[0074] The polynomial coefficients are obtained using the least squares method. Taking temperature as an example, the following optimization problem is solved:
[0075] ;
[0076] In the formula, For the first The actual temperature value at each time point; These are the fitted temperature values at the corresponding time points; This represents the number of historical data points. The coefficients are then obtained. Then, the temperature parameter acquisition curve function matrix can be constructed.
[0077] The reason for using power-law forms in these polynomial fittings is that ocean parameters typically exhibit complex nonlinear variations, and power functions can approximate various complex curves while being computationally simple and efficient. Higher-order terms can capture subtle characteristics of parameter changes, while lower-order terms reflect the overall trend. The introduction of error terms takes into account measurement errors and environmental disturbances, improving the robustness of the model.
[0078] The time window segmentation process in step S02 involves the construction of a parameter segment matrix, as shown below:
[0079] Temperature parameter segment matrix The representation is:
[0080] ;
[0081] In the formula, Indicates the first Within the first time window Temperature values at each sampling point; Number of time windows; The number of sampling points within each window.
[0082] Similarly, the salinity parameter segment matrix and depth parameter segment matrix They can be represented as:
[0083] ; ;
[0084] In step S03, the deviation between the parameter segment matrix and the parameter acquisition curve function matrix is calculated, as follows:
[0085] Temperature deviation value The calculation uses Euclidean distance:
[0086] ;
[0087] In the formula, For the first Within the first time window The actual temperature value of each sampling point; These are the fitted temperature values for the corresponding time points.
[0088] Or use Manhattan distance:
[0089] ;
[0090] Similarly, salinity deviation value and depth deviation value They can be represented as:
[0091] or ;
[0092] or ;
[0093] Deviation matrix It can be represented as:
[0094] ;
[0095] The reason for choosing Euclidean distance or Manhattan distance as the bias measure is that they quantify the degree of data deviation from geometric and path perspectives, respectively. Euclidean distance takes into account the straight-line distance in space, making it suitable for detecting abrupt changes; Manhattan distance takes into account the cumulative differences along the coordinate axes, making it more sensitive to gradual anomalies.
[0096] In step S04, a parameter association network graph is constructed, and the association strength between parameters is calculated as follows:
[0097] Calculation of Pearson correlation coefficient between parameters:
[0098] ;
[0099] In the formula, For the first The correlation coefficient between temperature and salinity within a time window; and These are the average values of temperature and salinity within the window, respectively.
[0100] Similarly, and These represent the correlation coefficients between temperature and depth, and between salinity and depth, respectively.
[0101] Alternatively, mutual information can be used to calculate the correlation strength of the parameters:
[0102] ;
[0103] In the formula, For the first The mutual information values of temperature and salinity within a time window; Temperature value Salinity value The joint probability; and Temperature values and salinity value The marginal probability.
[0104] The parameter association network graph can be represented as an adjacency matrix. :
[0105] ;
[0106] In the formula, , and These represent the correlation strengths between temperature-salinity, temperature-depth, and salinity-depth, respectively, which can be expressed as the average or weighted sum of the correlation coefficients or mutual information for each time window.
[0107] The key parameter association paths extracted based on the minimum spanning tree algorithm can be represented as edge sets. :
[0108] ;
[0109] In the formula, Indicates the edges to be retained; The number of nodes is 3, which is the number of parameters; the minimum spanning tree contains Edge.
[0110] The reason for using correlation coefficients and mutual information as parameters is that they quantify the dependencies between variables from linear and nonlinear perspectives, respectively. Correlation coefficients are simple to calculate and suitable for capturing linear relationships; mutual information can discover more complex nonlinear relationships. The minimum spanning tree algorithm is used to simplify the network structure, retain the most important relationships, and reduce subsequent computational complexity.
[0111] The specific representation of the multi-parameter coupled equation system in step S05 is as follows:
[0112] Temperature-salinity relationship equation:
[0113] ;
[0114] In the formula, For the first The theoretical coupling value of temperature and salinity within a time window; and These are the temperature and salinity data segments for this window, respectively. This is a function for mapping temperature and salinity data; A historical database linking temperature and salinity; The rate of change of ambient temperature; This refers to the density parameter of seawater. It is a comprehensive mapping function between historical data and environmental parameters; and Let be the weighting coefficient, satisfying ; This is the temperature-salt coupling error term.
[0115] Temperature-depth relationship equation:
[0116] ;
[0117] In the formula, For the first The theoretical coupling value of temperature depth within a time window; and These are the temperature and depth data segments for that window, respectively. This is a temperature depth data mapping function; For the Wenzhou-Shenzhen historical correlation database; For depth gradient coefficients; This represents the rate of change of water pressure. It is a comprehensive mapping function between historical data and environmental parameters; and Let be the weighting coefficient, satisfying ; This is the temperature-depth coupling error term.
[0118] Salt depth relation equation:
[0119] ;
[0120] In the formula, For the first The theoretical coupling value of salt depth for each time window; and These are the salinity and depth data segments for this window, respectively. This is a function for mapping salinity depth data; A historical database of salt depth; This represents the depth layering coefficient; Ocean current influencing factors; It is a comprehensive mapping function between historical data and environmental parameters; and Let be the weighting coefficient, satisfying ; This is the salt depth coupling error term.
[0121] Comprehensive equation relating temperature, salinity, and depth:
[0122] ;
[0123] In the formula, For the first The three-parameter integrated theoretical coupling value of each time window; , and These are the temperature, salinity, and depth data segments for that window, respectively. This is a three-parameter data mapping function; A historical correlation database of temperature, salinity, and depth; The parameter set for the seawater state equation; This is a comprehensive mapping function between historical data and state equations; and Let be the weighting coefficient, satisfying ; This is the three-parameter integrated coupling error term.
[0124] The methods for obtaining each parameter are as follows:
[0125] Ambient temperature change rate Calculated using temperature difference over a continuous time window: , For the first The average temperature of each window This represents the time window interval.
[0126] Seawater density parameters Calculations based on the international seawater state equation: ,in For temperature, Salinity For pressure.
[0127] Depth gradient coefficient Calculated using the depth change rate of continuous sampling points: .
[0128] Water pressure change rate Calculated based on depth changes: ,in This is the acceleration due to gravity.
[0129] Depth layering coefficient Calculated using the stratified characteristics of salinity with depth: .
[0130] Ocean current influencing factors Acquired from regional ocean current databases or measured using an acoustic Doppler current profiler.
[0131] Seawater Equation of State Parameter Set It includes various thermodynamic coefficients, obtained based on international oceanographic convention standards.
[0132] The historical correlation database is built using long-term observation data and includes parameter correlation characteristics for different sea areas and seasons.
[0133] The linear combination of these equations is chosen to consider both the actual characteristics of the current data (first term) and the constraints of historical patterns and physical models (second term). The weighting coefficients can be adjusted to reflect the relative importance of both factors based on actual needs. The inclusion of an error term addresses the uncertainty of the model and the influence of random factors. The overall equation design follows fundamental principles of ocean physics and effectively captures the complex relationships between parameters.
[0134] The calculation of the actual coupling value in step S06 is specifically expressed as follows:
[0135] The actual temperature-salt coupling value is calculated using covariance:
[0136] ;
[0137] In the formula, For the first The actual temperature-salt coupling value for each time window; and These are the first two numbers in the window. Temperature and salinity values at each sampling point; and These are the average values of temperature and salinity within that window, respectively. This represents the number of sampling points within the window.
[0138] Alternatively, cross-correlation coefficients can be used to calculate:
[0139] ;
[0140] Similarly, the actual coupling value of temperature depth Coupling value with actual salt depth Similar calculations can be performed.
[0141] The actual coupling value of the three parameters was calculated using normative correlation analysis.
[0142] ;
[0143] In the formula, For the first The actual coupling value based on the three parameters of each time window; and These are the parameter grouping matrices within the window; and A normalized vector; This represents the correlation coefficient function.
[0144] The reason for choosing covariance and correlation coefficient to calculate the actual coupling value is that they directly reflect the statistical correlation between parameters, are simple to calculate, and have clear physical meaning. Covariance reflects the absolute correlation strength, while correlation coefficient provides a normalized measure of correlation. For the combined relationship of three parameters, canonical correlation analysis can discover the maximum correlation pattern among multiple variables, making it suitable for handling high-dimensional data association problems.
[0145] The calculation of the coupling deviation matrix in step S07 is specifically represented as follows:
[0146] Calculation of temperature-salt coupling deviation matrix:
[0147] ;
[0148] In the formula, For the first Temperature-salt coupling deviation within a time window; and These represent the theoretical and actual temperature-salinity coupling values for this window, respectively.
[0149] Similarly, temperature-depth coupling deviation Salt depth coupling deviation and the three-parameter integrated coupling deviation Similar calculations can be performed.
[0150] Coupling deviation matrix It can be represented as:
[0151] ;
[0152] The normalization process uses the max-min normalization method:
[0153] ;
[0154] In the formula, This is the normalized coupling bias value; This is the original coupling deviation value; and These are the minimum and maximum values of all coupling deviations, respectively.
[0155] The reason for using absolute differences in coupling bias calculations is that they directly quantify the degree of inconsistency between theoretical expectations and actual observations, and are simple to calculate and easy to understand. Normalization unifies different types of biases to the same scale, facilitating subsequent comprehensive analysis and comparison.
[0156] Step S08 uses a pre-trained temperature, salinity, and depth error labeling neural network model to replace the traditional error determination function. The deviation value obtained in step S03 and the coupling deviation matrix obtained in step S07 are input into the temperature, salinity, and depth error labeling neural network model to calculate the error determination value.
[0157] In step S09, the error judgment threshold is generally set within the range of 0.7-0.85. The specific value needs to be weighed based on application requirements and the acceptable false alarm / false negative rate. The error judgment rule can be expressed as:
[0158] ;
[0159] In the formula, For the first Error markers for each time window or data point; This is the error judgment value; This is a preset threshold.
[0160] Different levels of error marking can be divided into:
[0161] ;
[0162] In the formula, For the first Error levels for each time window or data point; 1, 2, and 3 represent minor, moderate, and severe anomalies, respectively.
[0163] The threshold method is used for error determination because it implements a simple and clear binary classification strategy, which facilitates automated processing. Multi-level error labeling provides a more detailed classification of anomalies, which helps with subsequent differential processing.
[0164] Optionally, for a set of parameters, the deviation value can be quantified using Euclidean distance or Manhattan distance. For example, the temperature deviation value is calculated as follows:
[0165] Euclidean distance calculation method:
[0166] ;
[0167] In the formula, For the first Within the first time window The actual temperature value of each sampling point; These are the fitted temperature values at the corresponding time points; This represents the number of sampling points within the window.
[0168] Manhattan distance calculation method:
[0169] ;
[0170] The theoretical basis for deviation values lies in quantifying the degree of difference between actual measurement data and theoretical model predictions, providing a basic metric for subsequent error identification. Euclidean distance considers linear distance in space and is sensitive to outliers; Manhattan distance reflects cumulative deviation and is more sensitive to persistent shifts.
[0171] Specifically, the principle of this invention is as follows: The technical principle of this invention is based on a data error identification method that combines parameter correlation networks and deep learning. First, by establishing the acquisition curve function matrix of three parameters—temperature, salinity, and depth—the regularity of each parameter's variation over time or space is captured. Then, the newly sampled data is segmented according to time windows, and the deviation value between each parameter segment and the corresponding acquisition curve function matrix is calculated to initially screen out data intervals that may contain anomalies.
[0172] The core innovation of this invention lies in constructing a network diagram relating temperature, salinity, and depth, and using a minimum spanning tree algorithm to extract key parameter correlation paths, establishing a set of multi-parameter coupling equations. This step fully considers the physical coupling relationships between temperature, salinity, and depth in seawater. For example, the temperature-salinity equation describes the density correlation between temperature and salinity, the temperature-depth equation reflects the stratification characteristics of temperature with depth, and the salinity-depth equation reflects the variation law of salinity with depth. Through these equations, the system can calculate theoretical coupling values and compare them with actual coupling values calculated from actual measurement data to generate a coupling deviation matrix.
[0173] Furthermore, this invention innovatively introduces a hybrid expert neural network model based on a multi-head sparse attention mechanism to replace the traditional error determination function. This model, pre-trained on a large amount of historical data, can learn the characteristic patterns of different types of errors while simultaneously addressing the problem of imbalanced samples. The multi-head sparse attention mechanism enables the model to simultaneously focus on data features across different time windows and adaptively adjust based on the sparsity of the coupling bias matrix, significantly improving the model's ability to identify subtle anomalies.
[0174] This approach, which combines parametric physical correlation and deep learning, enables the system to distinguish between normal environmental fluctuations and instrument measurement errors, thereby achieving high-precision automatic error identification and labeling. It solves the problem of multi-parameter coupling anomalies that is difficult to handle with traditional methods.
[0175] The following provides a specific embodiment 1 of the present invention, and the specific implementation of each step in this embodiment 1 is described in detail below.
[0176] The specific implementation of step S01 involves constructing a function matrix of temperature, salinity, and depth parameter acquisition curves using historical measurement data. First, a large amount of historical temperature, salinity, and depth measurement data is collected and sorted according to time series or spatial distribution to ensure data continuity and representativeness. For each parameter's historical data, a least-squares method is used for polynomial fitting. The order of the fitting polynomial is selected based on the data complexity, typically 3-5 orders. For the temperature parameter acquisition curve function matrix, polynomial fitting can be expressed as: In the formula, The fitted temperature function; It is a time variable; These are the temperature polynomial coefficients; The order is the polynomial, typically taken as 3-5; This represents the temperature fitting error term. For the function matrix of the salinity parameter acquisition curve, polynomial fitting can be used to express it as: In the formula, The fitted salinity function; It is a time variable; These are the coefficients of the salinity polynomial; The order is the polynomial, typically taken as 3-5; This represents the salinity fitting error term. For the depth parameter acquisition curve function matrix, polynomial fitting can be expressed as: In the formula, The depth function is the fitted value. It is a time variable; These are the coefficients of the depth polynomial; The order is the polynomial, typically taken as 3-5; This represents the depth fitting error term. All fitted functions are organized into a matrix form, including the temperature parameter acquisition curve function matrix. It can be represented as: In the formula, Indicates the first Temperature polynomial coefficients for a time period or depth segment The value; The number of time periods or depth segments; This is the order of the polynomial. Similarly, the function matrix of the salinity parameter acquisition curve. and depth parameter acquisition curve function matrix They can be represented as: and The polynomial coefficients are obtained using the least squares method. Taking temperature as an example, the following optimization problem is solved: In the formula, For the first The actual temperature value at each time point; These are the fitted temperature values at the corresponding time points; The number of historical data points. The principle of fitting a polynomial function using the least squares method is to obtain the best-fit coefficients by minimizing the sum of squares between the actual observed values and the fitted values. This method can effectively capture the overall trend and local variation characteristics of the data, providing a reliable benchmark model for subsequent deviation analysis. The purpose of this step is to establish a benchmark model of parameter changes, providing a theoretical basis for subsequent deviation calculations. By fitting the regular characteristics of historical data, abnormal change patterns in new data can be effectively identified.
[0177] The specific implementation of step S02 involves segmenting the newly sampled data into time windows. First, an appropriate time window length is determined, typically selected based on the data sampling frequency and the changing characteristics of the marine environment, with a typical value of 5-15 minutes. The continuously collected temperature, salinity, and depth data are then divided into several data segments according to the set time window. For temperature data, a temperature parameter segment matrix is formed. In the formula, Indicates the first Within the first time window Temperature values at each sampling point; Number of time windows; This refers to the number of sampling points within each window. Similarly, the salinity parameter segment matrix... and depth parameter segment matrix Statistical features are extracted from each data segment, including mean, variance, slope, and kurtosis, which are used as the feature vector for that segment. The principle of time window segmentation is to decompose a long-term series into shorter time periods, facilitating local analysis of parameter variation characteristics. The choice of window length needs to balance resolution and stability; too short a window cannot reflect the complete trend, while too long a window may mask local anomalies. The purpose of this step is to decompose the long-term series into shorter time periods, facilitating local analysis of parameter variation characteristics and improving the temporal resolution and sensitivity of anomaly detection.
[0178] The specific implementation of step S03 involves calculating the deviation between the parameter segment matrix and the parameter acquisition curve function matrix. For each temperature parameter segment, the Euclidean distance or Manhattan distance is calculated between it and the predicted value of the temperature parameter acquisition curve function within the corresponding time interval. The temperature deviation is calculated using Euclidean distance as follows: In the formula, For the first Within the first time window The actual temperature value of each sampling point; This represents the fitted temperature value at the corresponding time point. Alternatively, it can be calculated using the Manhattan distance: Similarly, salinity deviation value and depth deviation value The three deviation values can be calculated separately. The calculated deviation values are then organized into a deviation value matrix. The principle behind calculating the deviation value is to quantify the degree of difference between the newly sampled data and historical patterns using a metric function. Euclidean distance and Manhattan distance measure the degree of deviation from geometric and path perspectives, respectively; the former is more sensitive to outliers, while the latter is more sensitive to persistent shifts. The purpose of this step is to quantify the degree of difference between the newly sampled data and historical patterns, providing a fundamental basis for error identification. A larger deviation value indicates that the data deviates further from historical patterns, and the higher the probability of an anomaly.
[0179] The specific implementation of step S04 involves constructing a parameter correlation network graph and establishing a system of multi-parameter coupled equations. First, temperature, salinity, and depth are used as nodes in the network graph. Based on historical data, the mutual information or correlation coefficients between the parameters are calculated as the edge weights. The Pearson correlation coefficient between the parameters is calculated as follows: In the formula, For the first The correlation coefficient between temperature and salinity within a time window; and These are the average values of temperature and salinity within that window, respectively. The same calculation is performed. and Alternatively, mutual information can be used to calculate the correlation strength of the parameters: In the formula, For the first The mutual information values of temperature and salinity within a time window; Temperature value Salinity value The joint probability; and Temperature values and salinity value The marginal probabilities. The parameter association network graph can be represented as an adjacency matrix. In the formula, , and The correlation strengths between temperature-salinity, temperature-depth, and salinity-depth are respectively represented, and can be calculated using the correlation coefficients or average or weighted sums of mutual information for each time window. Minimum spanning tree algorithms, such as Prim's algorithm or Kruskal's algorithm, are used to extract key parameter correlation paths from the parameter correlation network graph, retaining the edges with the highest correlation strength. The key parameter correlation paths extracted based on the minimum spanning tree algorithm can be represented as an edge set. In the formula, Indicates the edges to be retained; The number of nodes is 3, which is the number of parameters; the minimum spanning tree contains Edges. Based on the extracted key correlation paths, a multi-parameter relationship coupling equation system is established, including temperature-salinity relationship equations, temperature-depth relationship equations, salinity-depth relationship equations, and a comprehensive temperature-salinity-depth relationship equation. The principle of constructing the parameter correlation network graph is to visualize the relationships between parameters based on graph theory, while the minimum spanning tree algorithm extracts the most important correlations by retaining the edges with the smallest total weight, thus reducing complexity. The purpose of this step is to explore the inherent physical correlations between parameters, laying the foundation for subsequent coupling deviation analysis.
[0180] The specific implementation of step S05 involves calculating the theoretical coupling value using a set of multi-parameter relationship coupling equations. The temperature-salinity relationship equation is expressed as: In the formula, For the first The theoretical coupling value of temperature and salinity within a time window; and These are the temperature and salinity data segments for this window, respectively. This is a function for mapping temperature and salinity data; A historical database linking temperature and salinity; The rate of change of ambient temperature; This refers to the density parameter of seawater. It is a comprehensive mapping function between historical data and environmental parameters; and Let be the weighting coefficient, satisfying ; This represents the temperature-salt coupling error term. The temperature-depth relationship equation is expressed as: In the formula, For the first The theoretical coupling value of temperature depth within a time window; and These are the temperature and depth data segments for that window, respectively. This is a temperature depth data mapping function; For the Wenzhou-Shenzhen historical correlation database; For depth gradient coefficients; This represents the rate of change of water pressure. It is a comprehensive mapping function between historical data and environmental parameters; and Let be the weighting coefficient, satisfying ; This represents the temperature-depth coupling error term. The salt depth relationship equation is expressed as: In the formula, For the first The theoretical coupling value of salt depth for each time window; and These are the salinity and depth data segments for this window, respectively. This is a function for mapping salinity depth data; A historical database of salt depth; This represents the depth layering coefficient; Ocean current influencing factors; It is a comprehensive mapping function between historical data and environmental parameters; and Let be the weighting coefficient, satisfying ; This represents the salt depth coupling error term. The comprehensive equation relating temperature, salt depth, and temperature is expressed as: In the formula, For the first The three-parameter integrated theoretical coupling value of each time window; , and These are the temperature, salinity, and depth data segments for that window, respectively. This is a three-parameter data mapping function; A historical correlation database of temperature, salinity, and depth; The parameter set for the seawater state equation; This is a comprehensive mapping function between historical data and state equations; and Let be the weighting coefficient, satisfying ; This is the combined error term for the three parameters. Ambient temperature change rate. Calculated using temperature difference over a continuous time window: , For the first The average temperature of each window The time window interval is used for seawater density parameters. Calculations based on the international seawater state equation: ,in For temperature, Salinity For pressure. Depth gradient coefficient Calculated using the depth change rate of continuous sampling points: Water pressure change rate Calculated based on depth changes: ,in This refers to gravitational acceleration. Depth stratification coefficient. Calculated using the stratified characteristics of salinity with depth: Ocean current influencing factors Parameter sets of the seawater equation of state are obtained based on regional ocean current databases or measured using acoustic Doppler current profilers. It includes various thermodynamic coefficients, obtained based on international oceanographic convention standards. The principle of the multi-parameter coupling equation system is to establish the mapping relationship between parameters by combining physical models and data-driven methods. The equations adopt a linear combination form, taking into account both current data characteristics and historical constraints. The purpose of this step is to predict the ideal correlation state between parameters based on historical data and physical models, providing a benchmark for comparison with actual coupling values.
[0181] The specific implementation of step S06 involves calculating the actual coupling value between the parameter segment matrices in the newly sampled data. The actual temperature-salinity coupling value is calculated using covariance: In the formula, For the first The actual temperature-salt coupling value for each time window; and These are the first two numbers in the window. Temperature and salinity values at each sampling point; and These are the average values of temperature and salinity within that window, respectively. This represents the number of sampling points within the window. Alternatively, it can be calculated using the cross-correlation coefficient. Similarly, calculate the actual coupling value of temperature depth. Coupling value with actual salt depth The actual coupling values of the three parameters were calculated using normative correlation analysis. In the formula, For the first The actual coupling value based on the three parameters of each time window; and These are the parameter grouping matrices within the window; and A normalized vector; This represents the correlation coefficient function. The principle of calculating the actual coupling value is to directly extract the statistical correlation between parameters from the measured data. The covariance reflects the absolute correlation strength, the correlation coefficient provides a normalized correlation measure, and normalized correlation analysis can discover the maximum correlation pattern among multiple variables. The purpose of this step is to directly extract the interrelationships between parameters from the actual measurement data, providing a practical basis for comparison with the theoretical coupling value.
[0182] The specific implementation of step S07 involves calculating the coupling deviation matrix between the theoretical coupling value and the actual coupling value. Temperature-salinity coupling deviation calculation: In the formula, For the first Temperature-salt coupling deviation within a time window; and These represent the theoretical and actual temperature-salinity coupling values for this window, respectively. Similarly, the temperature-depth coupling deviation is calculated. Salt depth coupling deviation and the three-parameter integrated coupling deviation Coupling bias matrix Normalize all coupling biases: In the formula, This is the normalized coupling bias value; This is the original coupling deviation value; and These represent the minimum and maximum values of all coupling biases. The principle behind calculating coupling bias is to directly quantify the degree of inconsistency between theoretical expectations and actual observations through absolute differences. Normalization unifies different types of biases to the same scale, facilitating subsequent comprehensive analysis. The purpose of this step is to quantify the degree of difference between actual parameter relationships and theoretical expectations; a larger bias indicates a higher probability of data anomalies. Coupling bias analysis can capture anomalous patterns that might be overlooked by single-parameter bias analysis.
[0183] The specific implementation method of step S08 is the same as described above, and will not be repeated here.
[0184] The specific implementation of step S09 involves comparing the error judgment value with a preset threshold and marking the error. First, an appropriate error judgment threshold is set, typically a balance between application requirements and acceptable false alarm / false negative rates, with a typical value of 0.7-0.85. The error judgment value calculated in step S08 is then compared with the preset threshold. The judgment rule can be expressed as: In the formula, For the first Error markers for each time window or data point; This is the error judgment value; The preset threshold is used. Different levels of error markers can be categorized as follows: In the formula, For the first Error levels for each time window or data point; 1, 2, and 3 represent minor, moderate, and severe anomalies, respectively. The principle of threshold method for error determination is to implement a simple and clear binary classification strategy, which is convenient for automated processing. Multi-level error labeling provides a more detailed classification of the degree of anomaly, which is helpful for subsequent differential processing.
[0185] To better understand and implement this invention, a specific application scenario is provided below as Example 2: Researchers conducted a 30-day ocean observation mission in a certain sea area, using a multi-parameter CTD (temperature, salinity, depth) meter for data acquisition. This CTD meter collects data every 10 seconds, including three parameters: temperature, salinity, and depth. During the observation, the researchers found that traditional single-threshold judgment methods could not accurately identify measurement errors in complex marine environments. Therefore, they decided to apply a self-labeling method for measurement error data from a multi-parameter CTD meter.
[0186] First, researchers constructed function matrices for temperature, salinity, and depth parameter acquisition curves based on historical observation data. Historical temperature, salinity, and depth measurement data from the same sea area over the past five years were collected, totaling 84,320 sets of valid data. Least squares polynomial fitting was performed on these data, with fourth-order polynomials selected for temperature and salinity parameters, and third-order polynomials selected for depth parameters. The polynomial coefficients obtained after fitting are shown in Table 1.
[0187] Table 1. Coefficients of Polynomial Functions for Parameter Acquisition Curves
[0188]
[0189] Based on the above coefficients, temperature parameter acquisition curve function matrices, salinity parameter acquisition curve function matrices, and depth parameter acquisition curve function matrices were constructed, each containing 24 rows (corresponding to 24 hours in a day) and the corresponding number of columns.
[0190] Next, the researchers segmented the newly sampled data according to time windows. The time window length was set to 10 minutes, meaning each window contained 60 sampling points. The 30 days of continuous observation data generated a total of 4320 time windows, and within each window, temperature parameter segment matrices, salinity parameter segment matrices, and depth parameter segment matrices were generated.
[0191] Then, the deviation between the parameter segment matrix and the parameter acquisition curve function matrix is calculated. For each time window, Euclidean distance is used to calculate the deviation between the actual temperature data and the theoretical prediction, the deviation between the actual salinity data and the theoretical prediction, and the deviation between the actual depth data and the theoretical prediction. Some calculation results are shown in Table 2:
[0192] Table 2. Parameter Deviation Values for Partial Time Windows
[0193]
[0194] To construct the parameter correlation network graph, researchers calculated the correlation coefficients between parameters. Taking the first time window as an example, the correlation coefficient between temperature and salinity was -0.78, the correlation coefficient between temperature and depth was -0.92, and the correlation coefficient between salinity and depth was 0.85. Based on the average correlation coefficients across all time windows, the adjacency matrix of the parameter correlation network graph was constructed.
[0195] Table 3. Adjacency Matrix of Parameter-Associated Network Graph
[0196]
[0197] The minimum spanning tree was extracted using Prim's algorithm, retaining the temperature-depth and depth-salinity edges with weights of 0.89 and 0.82, respectively.
[0198] Based on the extracted key parameter correlation paths, researchers established a set of multi-parameter coupled equations. In the temperature-salinity relationship equation, weighting coefficients were set to α1=0.7 and α2=0.3; in the temperature-depth relationship equation, weighting coefficients were set to β1=0.8 and β2=0.2; in the salinity-depth relationship equation, weighting coefficients were set to θ1=0.75 and θ2=0.25; and in the comprehensive temperature-salinity-depth relationship equation, weighting coefficients were set to φ1=0.6 and φ2=0.4. Environmental parameter measurements showed: ambient temperature change rate 0.026°C / min, seawater density parameter ρ=1025.8 kg / m³, depth gradient coefficient 0.178 m / min, water pressure change rate 1.794 kPa / min, depth stratification coefficient 0.023 / m, and ocean current influence factor 0.135.
[0199] The theoretical coupling value was calculated using the above equations and parameters, and some of the calculation results are shown in Table 4:
[0200] Table 4 Theoretical Coupling Values for Partial Time Windows
[0201]
[0202] Next, the researchers calculated the actual coupling values. The covariance method was used to calculate the actual coupling values of temperature-salinity, temperature-depth, and salinity-depth, and normative correlation analysis was used to calculate the comprehensive actual coupling value of the three parameters. Some of the calculation results are shown in Table 5:
[0203] Table 5 Actual Coupling Values for Partial Time Windows
[0204]
[0205] Based on the theoretical and actual coupling values, the coupling deviation matrix was calculated. Some of the calculation results are shown in Table 6.
[0206] Table 6 Coupling Deviation Table for Partial Time Window
[0207]
[0208] After normalizing all deviation values, a feature vector was formed and input into a pre-trained temperature, salinity, and depth (TSD) error labeling neural network model. This model employs a hybrid expert structure with a multi-head sparse attention mechanism, containing 12 attention heads matched to the number of time windows. The model was trained using 15,000 sets of historical data and manually labeled error samples, employing the Adam optimizer with a learning rate of 0.001. After 350 training epochs, it achieved an F1 score of 0.92 on the test set.
[0209] The error determination values and labeling results obtained from the model calculation are shown in Table 7:
[0210] Table 7 Error Judgment Results for Partial Time Window
[0211]
[0212] Researchers set an error threshold of 0.75. When the error threshold was greater than 0.75, the corresponding data point was marked with an error. The error levels were divided into: 0.75-0.85 for slight anomalies (level 1), 0.85-0.95 for moderate anomalies (level 2), and 0.95-1.0 for severe anomalies (level 3).
[0213] By applying this method, 87 abnormal windows were detected out of 4320 time windows, including 35 minor abnormalities, 42 moderate abnormalities, and 10 severe abnormalities.
[0214] To verify the effectiveness of the method, the researchers manually checked 10 time windows marked as abnormal and found that 9 of them did indeed have measurement anomalies, achieving an accuracy rate of 90%. At the same time, they randomly selected 100 unmarked time windows for inspection and found that 2 of them had missed detections, with a false negative rate of 2%.
[0215] Traditional methods for detecting errors in temperature, salinity, and depth measurements primarily rely on threshold judgments for single parameters. For example, when temperature, salinity, or depth values exceed preset ranges, they are considered abnormal. This method is simple and direct, but it has significant drawbacks: first, it cannot detect data that is within reasonable ranges but is actually abnormal; second, it cannot make comprehensive judgments on interrelated parameters; third, it cannot adapt to environmental changes in different sea areas and seasons; and fourth, it is prone to misjudgment in highly fluctuating marine environments.
[0216] The method of this invention achieves automatic error labeling of temperature, salinity, and depth (TWD) measurement data through steps such as establishing a parameter acquisition curve function matrix, calculating parameter deviation values, constructing a parameter correlation network diagram, establishing a multi-parameter relationship coupling equation system, calculating theoretical and actual coupling values, and applying a neural network model. Compared with traditional methods, this invention has the following significant advantages: First, it considers the physical correlation between parameters, enabling the detection of correlation anomalies that cannot be found by single-parameter threshold methods; second, it introduces constraints from historical data and physical models, improving the accuracy of judgment; third, it reduces the false alarm rate and false negative rate by adaptively learning complex anomaly patterns through a neural network model; and fourth, it achieves fine-grained error level segmentation, facilitating subsequent differentiated processing.
[0217] It should be noted that the variables involved in this invention are explained in detail in Tables 8, 9, and 10 below.
[0218] Table 8. Variable Explanation Table (Part 1)
[0219]
[0220] Table 9. Variable Explanation Table (Part Two)
[0221]
[0222] Table 10 Variable Explanation Table (Part 3)
[0223]
[0224] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for self-labeling measurement error data of a multi-parameter temperature, salinity, and depth instrument, characterized in that, include: Establish function matrices for temperature parameter acquisition curves, salinity parameter acquisition curves, and depth parameter acquisition curves; The newly sampled data is segmented according to time windows to form temperature parameter segment matrices, salinity parameter segment matrices, and depth parameter segment matrices. The deviations between the temperature parameter segment matrices and the temperature parameter acquisition curve function matrices, the salinity parameter segment matrices and the salinity parameter acquisition curve function matrices, and the depth parameter segment matrices and the depth parameter acquisition curve function matrices are calculated respectively. A parameter correlation network graph is constructed among the temperature, salinity, and depth parameters. The minimum spanning tree algorithm is used to extract the key parameter correlation paths and establish a set of multi-parameter relationship coupling equations. The theoretical coupling value is calculated using a set of multi-parameter relationship coupling equations, and the actual coupling value is calculated to obtain the coupling deviation matrix. The deviation value and the coupling deviation matrix are input into the temperature, salinity, and depth error labeling neural network model to calculate the error judgment value. The error judgment value is compared with a preset threshold to label the error data. The multi-parameter coupled equation set includes temperature-salinity relationship equations, temperature-depth relationship equations, salinity-depth relationship equations, and comprehensive temperature-salinity-depth relationship equations. The temperature-salinity relationship equations describe the physical correlation between temperature and salinity, with inputs including a temperature parameter segment matrix, a salinity parameter segment matrix, a historical temperature-salinity correlation database, the rate of change of ambient temperature, and seawater density parameters; the output is a theoretical coupling value matrix of temperature-salinity. The temperature-depth relationship equations describe the physical correlation between temperature and depth, with inputs including a temperature parameter segment matrix, a depth parameter segment matrix, a historical temperature-depth correlation database, a depth gradient coefficient, and a rate of change of water pressure; the output is a theoretical coupling value matrix of temperature-depth. The salinity-depth relationship equations describe the physical correlation between salinity and depth, with inputs including a salinity parameter segment matrix, a depth parameter segment matrix, a historical salinity-depth correlation database, a depth stratification coefficient, and ocean current influence factors; the output is a theoretical coupling value matrix of salinity-depth. The comprehensive temperature-salinity-depth relationship equations integrate the comprehensive physical correlation between the three parameters, with inputs including a temperature parameter segment matrix, a salinity parameter segment matrix, a depth parameter segment matrix, a historical temperature-salinity-depth correlation database, and seawater state equation parameters; the output is a comprehensive theoretical coupling value matrix of the three parameters. The specific structure of the temperature-salinity-depth error labeling neural network model is a hybrid expert model based on a multi-head sparse attention mechanism. It includes an input layer, a multi-head sparse attention layer, a parameter encoding layer, a multi-layer perceptron layer, and an output layer. The multi-head sparse attention layer is used to handle the relationship between different parameters. The parameter encoding layer maps the original data to a high-dimensional feature space. The multi-layer perceptron layer performs feature fusion and nonlinear mapping. The output layer generates the final error judgment value.
2. The self-labeling method for measurement error data of the multi-parameter temperature, salinity, and depth instrument according to claim 1, characterized in that, The parameter acquisition curve function matrix is a matrix composed of polynomial functions obtained by fitting historical measurement data using the least squares method. It is used to describe the regularity of parameter changes over time or space.
3. The self-labeling method for measurement error data of the multi-parameter temperature, salinity, and depth instrument according to claim 2, characterized in that, A time window refers to dividing continuously collected data into multiple data segments at fixed time intervals. Each data segment contains several continuous sampling points and is used for local analysis of parameter variation characteristics.
4. The self-labeling method for measurement error data of the multi-parameter temperature, salinity, and depth instrument according to claim 3, characterized in that, Deviation refers to the difference between measured data and theoretical model predictions, which is quantified by calculating Euclidean distance or Manhattan distance.
5. The self-labeling method for measurement error data of the multi-parameter temperature, salinity, and depth instrument according to claim 4, characterized in that, The theoretical coupling values include the temperature-salinity theoretical coupling value matrix, the temperature-depth theoretical coupling value matrix, the salinity-depth theoretical coupling value matrix, and the three-parameter comprehensive theoretical coupling value matrix; Actual coupling values refer to the actual mutual influence values between parameters directly calculated from actual measurement data, including temperature-salinity actual coupling value matrix, temperature-depth actual coupling value matrix, salinity-depth actual coupling value matrix, and three-parameter comprehensive actual coupling value matrix.