Method for distilling embedded light-weight model of super large parameter model for dam safety monitoring

By constructing a cloud-based teacher model and using knowledge distillation to train an embedded student model, the problem of insufficient intelligence in data analysis in the dam safety monitoring system was solved. This enabled the lightweight model to run stably and perform real-time analysis on edge devices, thereby improving the intelligence and real-time performance of the dam safety monitoring system.

CN122196855APending Publication Date: 2026-06-12CHANGJIANG SPATIAL INFORMATION TECH ENG CO LTD (WUHAN) +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHANGJIANG SPATIAL INFORMATION TECH ENG CO LTD (WUHAN)
Filing Date
2026-05-18
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

The dam safety monitoring system lacks sufficient intelligence in data analysis, large-scale cloud models are difficult to deploy directly on edge devices, and field devices lack autonomous analysis capabilities when communication is unstable.

Method used

A cloud-based teacher model with ultra-large parameters is constructed, and an embedded student model is trained using the knowledge distillation method. Multi-stage distillation training and embedded deployment constraint optimization are performed to generate a lightweight model, which is then deployed to edge monitoring devices for real-time analysis.

🎯Benefits of technology

It enables stable operation on resource-constrained embedded devices, improves the data analysis and real-time response capabilities of the dam safety monitoring system, reduces reliance on high-performance servers and cloud computing resources, and enhances the level of risk warning.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196855A_ABST
    Figure CN122196855A_ABST
Patent Text Reader

Abstract

The application relates to a method for distillation embedded light model of super-large parameter model for dam safety monitoring. The method first collects dam image data and multi-type sensor monitoring data to construct a multi-modal monitoring data set, and performs data cleaning, time synchronization and feature normalization processing. Then, a super-large parameter teacher model is constructed in the cloud to extract features and identify the structure state of the multi-modal data, and generate distillation training samples. On this basis, a student model is constructed, which is trained in a combination of prediction distillation and feature distillation to enable the student model to learn the discrimination ability of the teacher model. Then, the student model is optimized through model pruning, parameter quantization and structure compression to generate an embedded light model, which is deployed on an edge monitoring device to realize real-time analysis of dam monitoring data and structural anomaly early warning. The method reduces the model calculation complexity while ensuring the recognition accuracy, and is suitable for real-time dam safety monitoring in an embedded environment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the fields of dam safety monitoring and edge intelligent computing technology, and in particular to a method for distilling embedded lightweight models using ultra-large parameter models for dam safety monitoring. Background Technology

[0002] With the continuous expansion of water conservancy projects and the extended service life of dams, dam safety monitoring systems require long-term, continuous, and high-precision monitoring and analysis of the dam structure's operational status. Current dam monitoring systems typically utilize various types of sensors, including those for displacement, stress, strain, seepage pressure, and temperature, to collect real-time data on the dam structure's condition, which is then centrally managed and analyzed through a data platform. As the number of monitoring devices increases and the monitoring cycle lengthens, the scale of data generated by dam safety monitoring systems continues to grow, placing higher demands on data analysis capabilities, real-time processing capabilities, and intelligent identification capabilities. Especially under complex operating environments and extreme conditions, monitoring systems not only need to comprehensively analyze multi-source monitoring data but also need to possess the ability to quickly identify and provide risk warnings for structural anomalies.

[0003] In related technologies, to improve the efficiency of monitoring data analysis, some technical solutions have begun to introduce deep learning models or large-scale neural network models to analyze and process monitoring data. For example, deep learning models are used to identify anomalies in sensor monitoring data, or computer vision methods are used to analyze structural deformation in dam surface images. With the development of artificial intelligence technology, some research has further attempted to use deep learning models with larger parameter scales to comprehensively analyze multimodal monitoring data, in order to improve the ability to identify complex structural states and judge trends. However, because such models usually have a large parameter scale and high computational complexity, they often need to be deployed on high-performance servers or cloud computing platforms. Therefore, in actual engineering monitoring scenarios, data analysis still mainly relies on cloud computing platforms.

[0004] However, in practical applications of dam safety monitoring, monitoring equipment is often distributed in mountainous or remote areas, and network communication conditions may be affected by environmental factors. When communication links experience delays or interruptions, monitoring systems relying on cloud computing struggle to obtain analysis results in a timely manner, thus affecting the real-time performance of anomaly identification and early warning responses. Furthermore, large-scale deep learning models running on embedded devices are limited by factors such as computing resources, storage space, and power consumption. Existing model compression or lightweighting methods still have limitations in maintaining model analytical capabilities and cannot simultaneously meet the requirements of multimodal data analysis and embedded deployment.

[0005] To address the aforementioned issues, a method for designing an embedded lightweight model using a large parameter model for dam safety monitoring is proposed. Summary of the Invention

[0006] This application provides a method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring, in order to solve the problems in related technologies such as insufficient intelligence in monitoring data analysis, difficulty in directly deploying large-scale cloud models on edge devices, and lack of autonomous analysis capabilities of field devices under unstable communication conditions in dam safety monitoring systems.

[0007] Firstly, a method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring is provided, including the following steps: S1. Collect multimodal data for dam safety monitoring and construct a monitoring dataset, wherein the multimodal data includes dam image data and sensor time-series monitoring data. Perform data cleaning, time synchronization and feature normalization processing on the multimodal data to form a standardized monitoring dataset. S2. Construct a high-parameter teacher model in the cloud based on the monitoring dataset, extract features and identify structural states of the multimodal data, and output the dam operation state prediction results and corresponding intermediate feature representations. S3. Use the teacher model to perform inference calculations on the monitoring dataset, and record the prediction results and intermediate layer feature representations output by the teacher model to construct a distillation training sample set; S4. Construct a student model and perform multi-stage distillation training on the student model based on the distillation training sample set, so that the student model learns the prediction results and feature representation of the teacher model. S5. Perform embedded deployment constraint optimization on the trained student model, including model parameter compression, weight quantization and computational structure optimization, to generate an embedded lightweight model. S6. Deploy the embedded lightweight model to the edge monitoring device, collect monitoring data in real time and call the embedded lightweight model to perform inference calculations to realize the identification and early warning of dam structure anomalies.

[0008] In some embodiments, the multimodal data includes dam surface image data and sensor time-series monitoring data. The dam surface image data is acquired by machine vision monitoring terminals installed at key structural locations of the dam. These terminals include industrial cameras, image acquisition modules, and edge processing modules, used for continuous image acquisition of dam surface cracks, seepage traces, structural displacement characteristics, and surface morphology changes. The sensor time-series monitoring data is acquired by various types of monitoring sensors installed inside or on the dam structure. These sensors include one or more of displacement sensors, stress sensors, strain sensors, seepage pressure sensors, and temperature sensors, used to acquire information about the dam structure's operational status. The machine vision monitoring terminals and sensors transmit the acquired data to a data processing system via a monitoring network, thereby forming multimodal monitoring data containing both image and structural monitoring information. The dam surface image data and sensor time-series monitoring data are stored together using a unified timestamp, device number, and monitoring point number, forming a multimodal sample record corresponding to the monitoring dataset.

[0009] In some embodiments, the data preprocessing process in step S1 includes multimodal data cleaning, data synchronization, and feature normalization. The image data preprocessing includes image noise filtering, image resolution unification, and image brightness and contrast adjustment to eliminate image quality differences caused by changes in ambient lighting or equipment noise during acquisition. The sensor monitoring data preprocessing includes outlier detection and removal, missing data interpolation, and time series smoothing to eliminate sensor measurement errors and data fluctuations. After data cleaning, the image data and sensor data are synchronized using timestamp information recorded by the monitoring equipment to align the multimodal data in the same time dimension. Subsequently, feature normalization is performed on the processed multimodal data to represent different types of monitoring data at a unified feature scale.

[0010] In some embodiments, the teacher model is a high-parameter deep learning model deployed on a cloud computing platform. This model consists of a multimodal feature extraction network and a multimodal fusion network. Image data is used for feature extraction via a convolutional neural network or a visual Transformer network to obtain dam image feature vectors. Sensor time-series data is used for feature extraction via a time-series neural network, including a long short-term memory network, a gated recurrent neural network, or a time-series Transformer network, to extract time-series variation features from dam monitoring data. After obtaining image and time-series features, the multimodal fusion network jointly models different modal features to achieve dam operation status identification. The teacher model is trained on a high-performance cloud computing platform and its parameters are optimized using a large amount of historical monitoring data to obtain a structural state identification model for outputting dam operation status prediction results and intermediate feature representations, providing a knowledge source for student model distillation training.

[0011] In some embodiments, the distillation training sample set is generated through a teacher model inference process. This process includes inputting preprocessed multimodal monitoring data into the teacher model for inference calculation, obtaining the dam structure state prediction result output by the teacher model, and recording the intermediate layer feature representation of the teacher model. The prediction result includes the dam operating state category, abnormal risk level, and corresponding probability distribution information. The intermediate layer feature representation is the intermediate feature vector of the teacher model in the feature extraction network or fusion network, used to reflect the model's deep feature expression of the multimodal monitoring data. After obtaining the above information, the original monitoring data, the teacher model prediction result, and the intermediate layer feature vector are combined to form a distillation training sample set for training the student model. The distillation training sample set includes the original monitoring data, the teacher model prediction result, and the intermediate layer feature vector, and the teacher model prediction result and the intermediate layer feature vector serve as supervision information for the student model's prediction distillation training and feature distillation training, respectively.

[0012] In some embodiments, the multi-stage distillation training in step S4 includes a prediction distillation stage and a feature distillation stage. In the prediction distillation stage, the probability distribution output by the teacher model is used as the supervision target to impose loss constraints on the probability distribution output by the student model. In the feature distillation stage, the intermediate layer feature representation of the teacher model is used as the supervision target to impose loss constraints on the corresponding layer intermediate feature representation of the student model. During the distillation training process, a comprehensive loss function including prediction loss and feature distillation loss is constructed, and the student model parameters are iteratively updated using the comprehensive loss function as the objective function.

[0013] In some embodiments, the embedded deployment constraint optimization includes model pruning, parameter quantization, and model structure compression. Model pruning involves sorting neural network connections, convolutional channels, or neurons in the trained student model based on the absolute value of neural network connection weights, channel norm, or gradient sensitivity, and deleting neural network connections, convolutional channels, or neurons whose sorting results satisfy preset pruning conditions. Parameter quantization involves converting floating-point weight parameters in the student model into fixed-point or integer weight parameters of a preset bit width. Model structure compression involves merging nodes, deleting redundant branches, and transforming inference operators in the computation graph of the student model to form a compressed model structure adapted for execution by an embedded processor.

[0014] In some embodiments, the embedded lightweight model includes a network structure after model pruning, weight parameters after parameter quantization, and a computation graph after model structure compression; the number of parameters, model file size, or single inference computation of the embedded lightweight model is less than the number of parameters, model file size, or single inference computation of the teacher model; the embedded lightweight model is used to receive dam image data and sensor time-series monitoring data collected by edge monitoring equipment, perform joint inference on the dam image data and the sensor time-series monitoring data, and output the dam operating status category, abnormal risk level, or structural abnormality judgment result.

[0015] In some embodiments, the edge monitoring device includes an embedded processing terminal or an edge computing node. The edge monitoring device communicates with a cloud platform through a monitoring network to receive monitoring data in real time and perform model inference tasks. The edge monitoring device includes a data acquisition module, a data processing module, and a model inference module. The data acquisition module is used to collect dam image data and sensor monitoring data. The data processing module is used to preprocess the monitoring data. The model inference module is used to call an embedded lightweight model to identify the structural state.

[0016] In some embodiments, after the embedded lightweight model is deployed, the edge monitoring device collects dam image data and sensor time-series monitoring data in real time, and calls the embedded lightweight model to perform online inference analysis; when the embedded lightweight model outputs an anomaly judgment result of the dam structure or the anomaly risk level reaches the preset warning condition, the edge monitoring device generates corresponding warning information; at the same time, the edge monitoring device uploads the monitoring data, inference results and anomaly samples to the cloud platform, the cloud platform updates the teacher model or distills the training sample set based on the uploaded data, and distributes the regenerated embedded lightweight model to the edge monitoring device.

[0017] The beneficial effects of the technical solution provided in this application include: 1. This application constructs a cloud-based teacher model with ultra-large parameters and trains an embedded student model using the knowledge distillation method, enabling the lightweight model to inherit the reasoning ability of the teacher model. While ensuring the accuracy of structural state recognition, it significantly reduces the model parameter scale and computational complexity, thereby enabling the model to run stably on resource-constrained embedded devices. 2. This application constructs a distillation training sample set, enabling the student model to not only learn the prediction results of the teacher model, but also to learn the feature representation information of the teacher model, thereby realizing the effective transfer of the cloud-based ultra-large model intelligent analysis capabilities to edge devices and improving the overall data analysis capabilities of the system. 3. This application integrates machine vision monitoring data and sensor monitoring data for multimodal analysis, enabling the model to more comprehensively reflect the dam's operating status, improve the ability to identify safety hazards such as abnormal dam deformation, abnormal seepage, and structural risks, thereby enhancing the risk warning level of the dam safety monitoring system; 4. This application enables the student model to achieve efficient inference computation on embedded devices by compressing parameters, quantizing weights, and optimizing the computation graph structure, thereby reducing inference latency and improving the real-time response capability of the monitoring system, thus meeting the needs of the dam safety monitoring system for real-time data analysis. 5. Since the embedded lightweight model can be directly deployed and run on edge monitoring devices, it reduces the dependence on high-performance servers and cloud computing resources, thereby reducing system operating costs and increasing the application value of dam safety monitoring systems in engineering projects. 6. By establishing a collaborative mechanism between cloud training and edge deployment, this application can continuously optimize and train the model using newly collected monitoring data during system operation, and regenerate the embedded lightweight model to achieve continuous iterative updates of the monitoring model, thereby continuously improving the system's monitoring accuracy and risk identification capabilities. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart illustrating the overall technical process of the ultra-large parameter model distillation embedded lightweight model method for dam safety monitoring according to the present invention. Figure 2 This is a flowchart of the multimodal monitoring data acquisition and preprocessing process of the present invention; Figure 3 This is a flowchart of the cloud-based ultra-large parameter teacher model construction and inference process of this invention; Figure 4 This is a flowchart illustrating the construction process of the distillation training sample set for this invention. Figure 5 This is a flowchart of the multi-stage distillation training process for the student model of the present invention; Figure 6 This is a schematic diagram comparing the computing power of the teacher model and the student model in this invention; Figure 7 This is a graph showing the change in the loss function during the distillation training process of the model of this invention. Figure 8 This is a test graph showing the inference performance of the embedded lightweight model of this invention. Figure 9 This is a statistical chart showing the results of dam structure anomaly identification in this invention; Figure 10 This is a schematic diagram of the cloud training and edge device collaborative update mechanism of the present invention. Detailed Implementation

[0020] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0021] This application provides a method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring. This method can solve the problems in related technologies such as insufficient intelligence in monitoring data analysis, difficulty in directly deploying large-scale cloud models on edge devices, and lack of autonomous analysis capabilities of field devices under unstable communication conditions in dam safety monitoring systems.

[0022] Please see Figures 1-10 This invention provides a method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring, comprising the following steps: (I) Step 1: Construct a multimodal monitoring dataset for dams In this step, multi-source monitoring data generated during dam operation are collected, time-aligned, and preprocessed to construct a multi-modal dam monitoring dataset for subsequent model distillation training. By structuring and feature-representing the monitoring data from different sources, the data can be uniformly processed by large-scale deep learning models, providing a data foundation for subsequent teacher model training and student model distillation.

[0023] (1) Multi-source monitoring data acquisition The dam safety monitoring system utilizes various types of monitoring equipment to collect real-time data on the dam's operational status. This monitoring equipment includes machine vision monitoring devices and structural monitoring sensor devices.

[0024] Among them, machine vision monitoring equipment is used to acquire image information of the dam surface, such as visual information such as dam surface deformation characteristics, crack changes and seepage traces; structural monitoring sensor equipment is used to collect operating parameters of the dam interior or key structural parts, including physical monitoring data such as displacement, stress, strain, seepage pressure and temperature.

[0025] At time t, the multimodal monitoring data collected by the dam monitoring system can be represented as: ;in: This represents the multimodal monitoring data of the dam at time t; This represents image data of the dam surface; This refers to the set of sensor monitoring data collected at the same time.

[0026] Sensor monitoring data set Composed of data from multiple types of sensors, it can be represented as: Where: n represents the number of monitoring sensor types; This represents the data collected by the i-th type of sensor at time t.

[0027] In a typical dam monitoring system, the aforementioned sensor data usually includes structural operating parameters such as dam displacement monitoring data, stress monitoring data, strain monitoring data, seepage pressure monitoring data, and temperature monitoring data.

[0028] The data collected by the aforementioned equipment can reflect the dam's operational status from multiple dimensions, including structural deformation, internal stress, seepage state, and environmental changes, providing basic data support for dam structural safety assessment.

[0029] (2) Constructing multimodal time series data Since dam safety monitoring is a long-term, continuous monitoring process, it is necessary to construct time series data from monitoring data collected at different times to reflect the evolution of the dam structure over time.

[0030] Suppose that the data sequence collected by the monitoring system over a continuous time period is: Where: T represents the length of the monitoring time series; This represents the multimodal monitoring data collected at the k-th time point.

[0031] Using the above method, dam monitoring data can be represented as a multimodal time series dataset containing both image and sensor data: ; This data structure can simultaneously retain spatial information (images) and structural operation information (sensor data), providing a foundation for subsequent model learning of the evolution of dam structure.

[0032] (3) Time alignment of multi-source monitoring data In practical engineering monitoring systems, the sampling frequencies of different monitoring devices often differ. For example, vision devices may acquire images at a frequency on the order of minutes, while some structural sensors may sample at a frequency on the order of seconds. Therefore, time synchronization processing of multi-source monitoring data is necessary.

[0033] Specifically, different time series data are mapped to a unified time axis through resampling or interpolation methods, thereby forming unified time series data: ;in: This represents time-aligned image data; This represents the time-aligned sensor monitoring data.

[0034] By aligning the time data, we can ensure that the monitoring data of different modalities have consistent temporal semantics within the same time window, which facilitates subsequent multimodal data fusion and analysis.

[0035] (4) Construction of multimodal feature representation To improve model training efficiency, feature extraction processing is required on the raw monitoring data.

[0036] For image data, it is processed using a visual feature extraction function: ; in: This represents the visual feature vector extracted from the dam image data. This represents the visual feature extraction function.

[0037] For sensor monitoring data, it is processed using a time-series feature extraction function: ;in: This represents the time-series feature vector extracted from the sensor monitoring data; This represents the time-series feature extraction function.

[0038] Finally, visual features and sensory features are fused to form a unified multimodal feature representation: ;in: This represents the fused multimodal feature vector.

[0039] This feature indicates that it contains both surface morphology information of the dam structure and information on the structural operation status, and can comprehensively describe the state of the dam structure.

[0040] (5) Constructing the distillation training dataset Based on the above multimodal feature representations, a dataset for model distillation training is constructed: ;in: This represents the distillation training dataset; Represents multimodal input features; This indicates the corresponding structural state label.

[0041] The structural status labels include normal status, attention status, early warning status, and alarm status, which are used to represent different levels of structural safety status.

[0042] Through the above steps, a complete multimodal monitoring dataset for dams can be constructed, providing basic data support for subsequent training of ultra-large parameter teacher models and embedded lightweight model distillation.

[0043] (II) Step 2: Constructing a cloud-based teacher model with ultra-large parameters After completing the construction of the dam multimodal monitoring dataset, it is necessary to establish a cloud-based ultra-large parameter model capable of in-depth analysis of the dam structure's operational status. This model serves as a teacher model, used to learn the complex correlations between dam monitoring data and output structural status judgment results, providing a knowledge source for subsequent lightweight model distillation.

[0044] (1) Construction of the overall structure of the teacher model In this step, a cloud-based teacher model is built based on a large-scale deep learning network. This model is deployed on a cloud computing server, where high-performance computing resources are used to comprehensively analyze multimodal monitoring data.

[0045] The teacher model consists of the following modules: Multimodal data encoding module: used to encode the multimodal features obtained in step one, so that different types of data can be expressed in a unified feature space.

[0046] Cross-modal feature fusion module: used to establish the correlation between visual information and sensor monitoring data, thereby extracting comprehensive features that can reflect the operating status of the dam structure.

[0047] Structural state reasoning module: used to analyze and judge the operational state of the dam structure based on the fused feature information, and output the corresponding risk level.

[0048] Through the collaborative work of the above modules, the teacher model can conduct in-depth analysis of the dam structure's operational status and establish a mapping relationship between multimodal monitoring data and structural risks.

[0049] (2) Multimodal feature encoding In order for the model to process image data and sensor time-series data simultaneously, feature encoding of different modal data is required.

[0050] For image features, the surface morphology features of the dam are extracted using a visual coding network; for sensor data, the dynamic changes of structural operating parameters are extracted using a temporal coding network.

[0051] Let the multimodal features obtained in step one be represented as follows: After processing by the teacher model encoding module, a high-dimensional feature representation is obtained: ;in: This represents the high-dimensional feature vector after the teacher model is encoded. This represents the multimodal coding function for the teacher model.

[0052] This encoding process allows monitoring information from different modalities to be mapped into a unified feature space, enabling the model to simultaneously learn the relationship between structural morphology changes and internal operating states.

[0053] (3) Cross-modal feature fusion After obtaining multimodal encoded features, it is necessary to further establish the correlation between data from different modalities. To this end, a cross-modal feature fusion mechanism is introduced into the teacher model, which uses a deep neural network to fuse features from different sources.

[0054] The integrated features after fusion are represented as follows: ;in: This represents the combined feature vector after fusion; This represents the result of multimodal feature fusion.

[0055] This fusion feature includes both dam surface morphology information and internal structural operating parameter information, thus providing a more comprehensive reflection of the dam's structural condition.

[0056] (4) Structural state reasoning After obtaining the fused features, the dam's operational status is analyzed and judged through the structural state reasoning module. This module uses a deep neural network to perform nonlinear mapping on the fused features, thereby outputting the dam's structural risk status.

[0057] The structural state prediction results can be expressed as: ;in: This indicates the predicted state of the dam structure. This represents the state prediction function of the teacher model.

[0058] Prediction results typically include multiple risk levels, for example: Normal status, attention status, warning status, alarm status.

[0059] By continuously analyzing different time series data, the teacher model can identify abnormal trends in dam structure changes, thereby providing reliable supervisory information for subsequent distillation training.

[0060] (5) Teacher model training After the model structure is built, the teacher model needs to be trained using the dam multimodal monitoring dataset obtained in step one. During training, the model parameters are optimized so that the model can accurately learn the mapping relationship between monitoring data and the dam structural state.

[0061] The trained teacher model possesses the following capabilities: Perform unified analysis on multimodal monitoring data; Identify abnormal operating conditions of the dam structure; Predicting trends in structural state changes; Output the structural safety risk level.

[0062] This teacher model serves as a knowledge source in the subsequent knowledge distillation process, guiding lightweight models to learn the reasoning capabilities of large-scale models.

[0063] The cloud-based teacher model built in step two can perform in-depth analysis of multimodal monitoring data and generate structural state prediction results and related reasoning information. In the next step, the teacher model will be used to reason about the monitoring data and generate distillation training samples, thereby providing supervision information for the training of the embedded lightweight model.

[0064] (III) Step 3: Constructing the distillation training sample set After completing the construction and training of the cloud-based, high-parameter teacher model, it is necessary to use the teacher model to perform inference analysis on the dam's multimodal monitoring data, thereby generating a distillation sample set for lightweight model training. The goal of this step is to extract high-quality supervision information through the teacher model, enabling the subsequent student model to learn the teacher model's reasoning ability and structural state judgment logic.

[0065] The distillation training samples not only contain monitoring data and their corresponding structural state labels, but also prediction results and feature representation information generated by the teacher model, thus providing multi-level supervision signals for subsequent knowledge distillation training.

[0066] (1) Generation of teacher model reasoning results First, the cloud-based teacher model trained in step two is used to perform inference calculations on the multimodal monitoring data of the dam constructed in step one, thereby obtaining the corresponding structural state prediction results.

[0067] Let the multimodal features input to the teacher model at time t be the fused features obtained in step two. The teacher model outputs the structural state prediction results: ;in: This represents the structural state predicted by the teacher model at time t; The prediction results are usually in the form of a probability distribution, used to represent the probability of occurrence of different risk levels.

[0068] For example: ;in, The probability that the structure is in a normal state; The probability that the structure is in a state of interest; The probability that the structure is in a warning state; : The probability that the structure is in an alarm state.

[0069] The above methods can be used to obtain the teacher model's prediction results for dam monitoring data.

[0070] (2) Distillation label generation After obtaining the prediction results from the teacher model, it is necessary to construct distillation training labels. Distillation labels include not only traditional structural state classification labels, but also probability distribution information output by the teacher model.

[0071] Let the distillation label be: ;in: This represents the distillation supervision labels generated by the teacher model.

[0072] Unlike traditional supervised learning, which uses only a single category label, distilled labels retain the probability distribution information of the teacher model output, thus reflecting the relationship between different structural states more comprehensively.

[0073] This labeling format can provide richer learning information for student models, thereby improving model training effectiveness.

[0074] (3) Extraction of teacher model feature information In order for the student model to learn the feature representation ability of the teacher model, it is also necessary to extract the intermediate layer feature information of the teacher model.

[0075] Suppose that the feature representation output by the teacher model at a certain intermediate layer is: ;in: This represents the high-dimensional feature representation extracted by the teacher model at time t. This feature can reflect the deep correlation between multimodal monitoring data.

[0076] By incorporating intermediate feature information into the distillation training samples, the student model can learn not only the prediction results during training, but also the feature representation of the teacher model.

[0077] (4) Construction of distillation sample data structure By combining monitoring data, teacher model prediction results, and feature information, a complete distillation training sample is constructed.

[0078] Distillation training samples can be represented as: Where: T represents the distillation training sample set; This represents the multimodal input features obtained in step one; This represents the distillation labels generated by the teacher model; This represents the feature information extracted by the teacher model.

[0079] The distilled sample constructed in the above manner also contains: Input information (monitoring data characteristics); Output information (teacher model prediction results); Intermediate feature information (feature representation of the teacher model).

[0080] Therefore, it can provide multi-level supervision signals for subsequent knowledge distillation training.

[0081] (5) Construction of distillation sample set After processing all the monitoring time series data, a complete distillation training dataset can be obtained: Where: N represents the number of distillation training samples; Let i represent the i-th distillation training sample.

[0082] This distillation sample set can comprehensively cover different operating states of the dam, including normal operation and different types of structural anomalies, thus providing a data foundation for subsequent lightweight model distillation training.

[0083] The distillation training sample set constructed in step three already contains the prediction results and feature representation information of the teacher model. In the next step, this distillation sample set will be used to train the lightweight student model through knowledge distillation, enabling the student model to learn the reasoning ability of the teacher model, thereby maintaining a high structural state recognition ability despite a small parameter scale.

[0084] (iv) Step 4: Construct the student model and conduct multi-stage distillation training After obtaining the distillation training sample set constructed in step three, a student model with a smaller parameter size needs to be built. This student model then learns the reasoning ability of the teacher model through knowledge distillation. The goal of this step is to enable the student model to maintain a high level of dam structure state recognition capability while significantly reducing the model parameter size and computational complexity, thereby meeting the deployment requirements of embedded devices.

[0085] The student model learns through a multi-stage distillation training method, enabling it to simultaneously learn the teacher model's prediction results, feature representations, and structural state judgment logic.

[0086] (1) Construction of student model structure First, a student model is built for embedded deployment. Compared to the teacher model, the student model is simplified in terms of network layers, parameter size, and computational complexity.

[0087] The student model mainly includes the following modules: Input encoding module: used to receive the multimodal input features from step three and perform preliminary feature encoding.

[0088] Feature fusion module: used to fuse feature information from different sources to form a comprehensive feature representation.

[0089] State prediction module: Used to output the state prediction results of the dam structure based on the fusion features.

[0090] Let the prediction function of the student model be: ;in: Represents the prediction function of the student model; This represents the structural state prediction made by the student model at time t.

[0091] (2) Predictive distillation training In order for the student model to learn the predictive ability of the teacher model, it is necessary to use the distillation labels in step three to supervise the training of the student model.

[0092] The goal of predictive distillation is to make the student model's output as close as possible to the teacher model's prediction. The predictive distillation loss function can be expressed as: ;in: This indicates the predicted distillation loss; This represents the student model's prediction results; This represents the distillation labels generated by the teacher model.

[0093] By minimizing the predicted distillation loss, the student model gradually learns the risk judgment ability of the teacher model.

[0094] (3) Characteristic distillation training In addition to predicting the results, the student model also needs to learn the feature representation capabilities of the teacher model. Therefore, a feature distillation mechanism is further introduced during the distillation training process.

[0095] Suppose that the feature representation of the student model output at the corresponding layer is: The characteristic distillation loss can then be expressed as: ;in: These are the teacher model features extracted in step three; This represents the feature representation of the corresponding layer in the student model.

[0096] By training with feature distillation, student models can learn the feature representation methods of teacher models, thereby improving the model's ability to understand multimodal data.

[0097] (4) Multi-task distillation training In dam safety monitoring scenarios, monitoring models not only need to classify structural states but also identify abnormal trends. Therefore, a multi-task training mechanism is further introduced during the distillation training process, enabling the student model to learn multiple monitoring tasks simultaneously.

[0098] The multi-task distillation loss can be expressed as: ;in: This represents the structural state recognition loss; This indicates the loss in predicting structural change trends; These are the weighting coefficients.

[0099] Through multi-task distillation training, student models can achieve good learning results in both structural state recognition and trend prediction.

[0100] (5) Comprehensive distillation optimization During training, the prediction distillation loss, feature distillation loss, and multi-task loss are comprehensively optimized to obtain the overall training objective of the student model: ;in: , , This is the loss weighting coefficient.

[0101] By continuously optimizing the above objective function, the student model can gradually approach the reasoning ability of the teacher model.

[0102] After multi-stage distillation training, the student model has been able to learn the analytical capabilities of the teacher model with a relatively small parameter scale. In the next step, the trained student model will undergo structural compression and embedded optimization to enable stable operation on edge devices, thereby achieving real-time data analysis and anomaly identification for the dam safety monitoring system.

[0103] (v) Step 5: Embedded Deployment Constraint Optimization After completing the multi-stage distillation training of the student model, it is necessary to optimize the trained model for embedded deployment to ensure stable operation in edge device environments. Since embedded devices have limitations in computing power, storage capacity, and power consumption, structural and computational optimizations of the student model are required to meet the operational requirements of edge devices.

[0104] This step introduces embedded deployment constraints to compress parameters, optimize computational structure, and adapt operators in the student model, thereby generating a lightweight model suitable for deployment on edge devices.

[0105] (1) Model parameter compression First, parameter compression is performed on the student model to reduce redundant parameters and improve model efficiency. Common parameter compression methods include model pruning and weight compression.

[0106] Let the set of parameters for the student model be: ;in: Indicates the number of model parameters; The first in the model One parameter.

[0107] By using parameter pruning, parameters that contribute little to the model's predictions are removed, resulting in a compressed parameter set. ;in: This represents the compressed set of model parameters.

[0108] Parameter compression can effectively reduce the computational complexity and storage requirements of the model.

[0109] (2) Model Quantization Optimization After parameter compression, the model parameters are quantized to further reduce model storage space and computational overhead. Quantization is typically achieved by reducing parameter precision, for example, converting high-precision floating-point parameters to low-precision integer representations.

[0110] Let the weights of the original model be represented as: The quantized weights can be expressed as: ;in: Represents the quantization function; This represents the quantized model weights. Quantization significantly reduces model storage space and improves computational efficiency on embedded hardware.

[0111] (3) Optimization of computational graph structure To further improve the model's operating efficiency on embedded devices, the model computation graph needs to be structurally optimized. Specifically, this involves simplifying and merging the network computation process, reducing the number of computation nodes, and lowering the computational overhead during model inference.

[0112] The optimized computational structure can be represented as: ;in: This represents the computational graph of the original model. Represents the computation graph optimization function; This represents the optimized model computation structure.

[0113] Optimizing computational graphs can reduce computational steps in the model inference process and improve model running efficiency.

[0114] (4) Embedded operator adaptation Different embedded devices support different types of computation operators, so it is necessary to adapt the model's computation operators to ensure that they can run stably on the target hardware platform.

[0115] In actual deployment, the general computational operators in the model are converted into operator forms supported by the target hardware platform, for example: ARM CPU operators, NPU accelerated operators, and GPU accelerated operators.

[0116] Operator adaptation processing allows the model to fully utilize the hardware acceleration capabilities of embedded devices, thereby improving inference efficiency.

[0117] (5) Generation of embedded deployment model After parameter compression, model quantization, computation graph optimization, and operator adaptation, a lightweight model suitable for deployment on edge devices can be generated.

[0118] Let the final generated embedded deployment model be: ;in: This represents a lightweight model suitable for operation on embedded devices.

[0119] This model can perform real-time data analysis and structural state identification tasks on embedded devices, and meets the real-time and stability requirements of dam safety monitoring systems.

[0120] By optimizing the embedded deployment constraints in step five, a lightweight model that meets the operational requirements of edge devices can be obtained. In the next step, the optimized model will be deployed to the edge monitoring device, enabling it to perform real-time analysis of the collected dam monitoring data, thereby achieving structural anomaly identification and risk warning.

[0121] (vi) Step 6: Generate and deploy the embedded lightweight model After completing the model compression, quantization, and computational graph optimization in step five, a lightweight model suitable for embedded devices can be generated and deployed to the edge computing devices of the dam safety monitoring system. Embedded deployment allows the monitoring system to directly perform data analysis and risk identification tasks in the field environment, thereby improving the real-time performance and reliability of the dam safety monitoring system.

[0122] This process mainly includes three stages: lightweight model generation, embedded runtime environment adaptation, and model deployment and operation.

[0123] (1) Embedded lightweight model generation First, the optimized student model is converted into an embedded inference model format to meet the operational requirements of embedded devices. The model conversion process mainly includes model structure freezing, parameter fixing, and inference graph generation.

[0124] Let the optimized model from step five be: The embedded inference model can be obtained through the model transformation function. ;in: This represents the student model after compression and optimization. Represents the model transformation function; This represents the final generated embedded lightweight model.

[0125] In practical engineering, model conversion can adopt common inference model formats, such as ONNX, TensorRT, or other embedded inference framework formats, to ensure that the model can run efficiently in the embedded hardware environment.

[0126] (2) Embedded runtime environment adaptation To ensure the stable operation of the lightweight model, it is necessary to adapt the software and hardware environment of the embedded device. An embedded runtime environment typically includes the following components: Embedded operating system, inference computing framework, hardware acceleration module.

[0127] Assume the embedded device environment is as follows: ;in: Indicates an embedded hardware platform; This refers to an embedded software system; This indicates the reasoning runtime environment.

[0128] By adapting the runtime environment, it can be ensured that the embedded lightweight model can stably perform inference tasks on the target device.

[0129] (3) Real-time reasoning of monitoring data Once the model is deployed, edge devices can perform real-time analysis of the dam monitoring data collected on-site. The monitoring system acquires dam status data through sensors and image acquisition devices, and inputs the data into the embedded model for inference calculations.

[0130] Let the input monitoring data be: ;in: Indicates time Monitoring data collected in real time.

[0131] The inference process of an embedded lightweight model can be represented as follows: ;in: This represents the monitoring results output by the model; these results can be expressed as structural status identification results or anomaly risk scores. Through real-time inference calculations, the system can quickly identify the dam's operational status and determine whether there are potential safety risks.

[0132] (4) Anomaly identification and early warning output After obtaining the model's prediction results, the system needs to assess the risks associated with the monitoring results. When the model's output exceeds a preset threshold, the system will automatically trigger an early warning mechanism.

[0133] Let the risk assessment function be: ;in: This represents the risk assessment function; This indicates the anomaly detection result. When... When preset conditions are met, the system will automatically generate early warning information and send early warning prompts to management personnel through the monitoring platform, thereby realizing intelligent monitoring and risk warning of the dam's operating status.

[0134] (5) Cloud-edge collaborative update mechanism During system operation, edge devices will continuously collect new monitoring data. When the system detects complex anomalies or a decrease in recognition accuracy, the relevant data can be uploaded to the cloud platform for model update and training.

[0135] The updated model in the cloud can be re-generated into a new lightweight model through distillation optimization and embedded deployment processes, and then distributed to edge devices to achieve continuous model optimization.

[0136] By establishing a collaborative mechanism between cloud-based training and edge deployment, the identification capabilities and early warning accuracy of the dam safety monitoring system can be continuously improved.

[0137] Through the above steps, the complete technical process from distillation of ultra-large parameter teacher models to generation and deployment of embedded lightweight models can be completed, thereby enabling the dam safety monitoring system to achieve efficient and real-time structural status identification and risk warning in an edge device environment.

[0138] Example 1: In a safety monitoring system for a large reservoir dam, multimodal monitoring equipment is deployed to acquire real-time information on the dam's operational status. This monitoring equipment includes machine vision monitoring terminals, dam deformation monitoring sensors, seepage monitoring sensors, and environmental monitoring sensors. The machine vision monitoring terminal uses an industrial camera to continuously acquire images of the dam's surface structure; the sensor equipment collects operational data such as dam displacement, seepage pressure, temperature, and environmental parameters.

[0139] During system operation, image data of the dam body and data from various types of sensors are first collected through on-site monitoring equipment, and the collected data is then uploaded to the data processing platform. The collected data undergoes preprocessing, including data denoising, outlier filtering, time series alignment, and image resolution standardization, thereby constructing a multimodal monitoring dataset in a unified format.

[0140] Subsequently, a high-parameter teacher model was constructed on a cloud server, and this model was used to perform structural state analysis on the monitoring data. The teacher model performs deep feature extraction and semantic reasoning on the dam monitoring data, outputting predictions of the dam's operational status and generating corresponding risk assessment information. During the model inference process, the teacher model's prediction results, feature representation information, and inference explanation chains were simultaneously recorded, thus forming supervisory information for distillation training.

[0141] After obtaining the above information, a distillation training sample set is constructed using monitoring data and the inference results of the teacher model. This distillation sample set includes the original monitoring data, the structural state prediction labels output by the teacher model, and the feature representations of the intermediate layers of the model. The student model is trained using this distillation sample set, enabling it to learn the risk discrimination ability and feature representation methods of the teacher model.

[0142] During the model training phase, a multi-stage distillation training method is used to optimize the student model. First, the prediction results output by the teacher model are used to train the student model through prediction distillation. Then, through feature distillation, the student model learns the intermediate feature representations of the teacher model. Finally, through multi-task training, the student model is equipped with both structural state recognition and abnormal trend recognition capabilities, thereby improving the overall monitoring performance of the model.

[0143] After completing the distillation training, the student model undergoes embedded deployment optimization. This includes steps such as model parameter compression, weight quantization, and computational graph structure optimization. Through these optimizations, the model parameter size and computational complexity can be significantly reduced, enabling the model to adapt to the computational resource conditions of embedded devices.

[0144] After model optimization, the optimized student model is converted into an embedded inference model and deployed to an edge monitoring device. This edge device can be an ARM-based embedded computing platform equipped with image acquisition and sensor data acquisition interfaces. The edge device collects dam monitoring data in real time and uses the embedded lightweight model for online inference analysis, thereby achieving real-time monitoring of the dam's operational status.

[0145] When the model detects abnormalities in the dam structure or risk indicators exceeding preset thresholds, the system automatically generates early warning information and sends alarm prompts to management personnel through the monitoring platform, thereby realizing intelligent monitoring and risk warning of the dam's safe operation status.

[0146] In practical engineering testing, the distilled embedded lightweight model was deployed on an ARM architecture edge device for testing. Test results show that the model's single inference latency on the ARM platform is less than 200 ms, which meets the real-time analysis requirements of the dam safety monitoring system. Furthermore, in actual monitoring data testing, the model achieved an accuracy rate of over 90% in identifying abnormal states in the dam structure, verifying the feasibility and effectiveness of the method in engineering applications.

[0147] Through the above implementation methods, the present invention realizes knowledge transfer between cloud-based ultra-large parameter models and edge-embedded lightweight models, enabling the dam safety monitoring system to significantly reduce computing resource requirements while ensuring monitoring accuracy, thereby achieving efficient and real-time safety monitoring and risk warning in an embedded device environment.

[0148] The working principle of this application is as follows: First, a training dataset was constructed by collecting multi-source monitoring data during the dam's operation. The monitoring data included various types such as dam deformation monitoring data, seepage monitoring data, environmental monitoring data, and image monitoring data. Through unified preprocessing and feature extraction of the multi-source data, multimodal feature data suitable for model training was generated, thus providing a data foundation for subsequent model training. Building upon this foundation, a high-parameter teacher model is constructed in the cloud. This teacher model possesses strong data representation and reasoning capabilities, enabling deep feature extraction and structural state analysis of dam monitoring data. By performing inference calculations on training data using the teacher model, structural state prediction results and multi-layered feature representation information can be generated. These elements collectively constitute the knowledge information required for distillation training. Subsequently, the training data is inferred using the teacher model to construct a distilled training sample set. The distilled samples include not only the original monitoring data but also the predicted labels and intermediate feature representations generated by the teacher model. In this way, the complex knowledge contained in the teacher model can be transformed into supervised information that can be used for student model learning. After obtaining the distillation training sample set, a student model with a small parameter size is constructed, and a multi-stage distillation training method is used to enable the student model to learn the reasoning ability of the teacher model. During the distillation training process, the student model not only learns the prediction results of the teacher model, but also its feature representation and structural state judgment rules, thus maintaining high recognition accuracy with a small parameter size. After completing the distillation training, the student model is optimized for embedded deployment constraints. Through methods such as model parameter compression, weight quantization, and computational graph structure optimization, the model parameter size and computational complexity are further reduced, enabling the model to run stably on resource-constrained embedded devices. Finally, the optimized model is converted into an embedded lightweight model and deployed to the edge devices of the dam safety monitoring system. When the field monitoring equipment collects new monitoring data, the embedded model can analyze the data in real time, output the dam structure status identification results, and automatically determine the risks based on the identification results. When the monitoring results exceed the safety threshold, the system automatically generates early warning information, thereby realizing intelligent monitoring and risk warning of the dam's operating status. Through the above technical process, the present invention realizes knowledge transfer between cloud-based ultra-large parameter models and edge embedded models, enabling the dam safety monitoring system to significantly reduce computing resource requirements while maintaining high recognition accuracy, thereby enabling real-time and efficient safety monitoring and early warning functions on embedded devices. Compared with existing technologies, this invention constructs a teacher model with ultra-large parameters and adopts a multi-stage knowledge distillation method, enabling the embedded lightweight model to inherit the reasoning ability of the teacher model. While ensuring monitoring accuracy, it significantly reduces the computational complexity of the model and has high engineering application value.

[0149] The above description is merely a specific embodiment of this application, enabling those skilled in the art to understand or implement this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims

1. A method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring, characterized in that, Includes the following steps: S1. Collect multimodal data for dam safety monitoring and construct a monitoring dataset, wherein the multimodal data includes dam image data and sensor time-series monitoring data. Perform data cleaning, time synchronization and feature normalization processing on the multimodal data to form a standardized monitoring dataset. S2. Construct a high-parameter teacher model in the cloud based on the monitoring dataset, extract features and identify structural states of the multimodal data, and output the dam operation state prediction results and corresponding intermediate feature representations. S3. Use the teacher model to perform inference calculations on the monitoring dataset, and record the prediction results and intermediate layer feature representations output by the teacher model to construct a distillation training sample set; S4. Construct a student model and perform multi-stage distillation training on the student model based on the distillation training sample set, so that the student model learns the prediction results and feature representation of the teacher model. S5. Perform embedded deployment constraint optimization on the trained student model, including model parameter compression, weight quantization and computational structure optimization, to generate an embedded lightweight model. S6. Deploy the embedded lightweight model to the edge monitoring device, collect monitoring data in real time and call the embedded lightweight model to perform inference calculations to realize the identification and early warning of dam structure anomalies.

2. The method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring as described in claim 1, characterized in that: The multimodal data includes dam surface image data and sensor time-series monitoring data. The dam surface image data is acquired by machine vision monitoring terminals installed at key structural locations of the dam. These terminals include industrial cameras, image acquisition modules, and edge processing modules, used for continuous image acquisition of dam surface cracks, seepage traces, structural displacement characteristics, and surface morphology changes. The sensor time-series monitoring data is acquired by various types of monitoring sensors installed inside or on the dam structure. These sensors include one or more of displacement sensors, stress sensors, strain sensors, seepage pressure sensors, and temperature sensors, used to acquire information about the dam structure's operational status. The machine vision monitoring terminals and sensors transmit the acquired data to a data processing system via a monitoring network, thereby forming multimodal monitoring data containing both image and structural monitoring information. The dam surface image data and sensor time-series monitoring data are stored together using a unified timestamp, device number, and monitoring point number, forming a multimodal sample record corresponding to the monitoring dataset.

3. The method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring as described in claim 1, characterized in that: The data preprocessing process in step S1 includes multimodal data cleaning, data synchronization, and feature normalization. The image data preprocessing process includes image noise filtering, image resolution unification, and image brightness and contrast adjustment to eliminate image quality differences caused by changes in ambient lighting or equipment noise during the acquisition process. The sensor monitoring data preprocessing process includes outlier detection and removal, missing data interpolation, and time series smoothing to eliminate sensor measurement errors and data fluctuations. After data cleaning is completed, the image data and sensor data are synchronized in a unified time dimension using the timestamp information recorded by the monitoring device. The processed multimodal data is then normalized to represent different types of monitoring data under a unified feature scale.

4. The method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring as described in claim 1, characterized in that: The teacher model is a high-parameter deep learning model deployed on a cloud computing platform. The model consists of a multimodal feature extraction network and a multimodal fusion network. Image data is used for feature extraction via convolutional neural networks or visual Transformer networks to obtain dam image feature vectors. Sensor time-series data is used for feature extraction via temporal neural networks, including long short-term memory networks, gated recurrent neural networks, or time-series Transformer networks, to extract temporal variation features from dam monitoring data. After obtaining image and temporal features, the multimodal fusion network jointly models different modal features to achieve dam operation status identification. The teacher model is trained on a high-performance cloud computing platform and its parameters are optimized using a large amount of historical monitoring data to obtain a structural state identification model for outputting dam operation status prediction results and intermediate feature representations, providing a knowledge source for student model distillation training.

5. The method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring as described in claim 1, characterized in that: The distillation training sample set is generated through the teacher model inference process. Its construction includes inputting preprocessed multimodal monitoring data into the teacher model for inference calculation, obtaining the dam structure state prediction result output by the teacher model, and recording the intermediate layer feature representation of the teacher model. The prediction result includes the dam operating state category, abnormal risk level, and corresponding probability distribution information. The intermediate layer feature representation is the intermediate feature vector of the teacher model in the feature extraction network or fusion network, used to reflect the model's deep feature expression of the multimodal monitoring data. After obtaining the above information, the original monitoring data, the teacher model prediction result, and the intermediate layer feature vector are combined to form a distillation training sample set for training the student model. The distillation training sample set includes the original monitoring data, the teacher model prediction result, and the intermediate layer feature vector, and the teacher model prediction result and the intermediate layer feature vector serve as supervision information for the student model's prediction distillation training and feature distillation training, respectively.

6. The method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring as described in claim 1, characterized in that: The multi-stage distillation training in step S4 includes a prediction distillation stage and a feature distillation stage. In the prediction distillation stage, the probability distribution output by the teacher model is used as the supervision target, and loss constraints are applied to the probability distribution output by the student model. In the feature distillation stage, the intermediate layer feature representation of the teacher model is used as the supervision target, and loss constraints are applied to the corresponding layer intermediate feature representation of the student model. During the distillation training process, a comprehensive loss function including prediction loss and feature distillation loss is constructed, and the student model parameters are iteratively updated using the comprehensive loss function as the objective function.

7. The method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring as described in claim 1, characterized in that: The embedded deployment constraint optimization includes model pruning, parameter quantization, and model structure compression. Model pruning involves sorting neural network connections, convolutional channels, or neurons in the trained student model based on the absolute value of neural network connection weights, channel norm, or gradient sensitivity, and deleting neural network connections, convolutional channels, or neurons whose sorting results meet preset pruning conditions. Parameter quantization involves converting floating-point weight parameters in the student model into fixed-point or integer weight parameters of a preset bit width. Model structure compression involves merging nodes, deleting redundant branches, and transforming inference operators in the computation graph of the student model to form a compressed model structure adapted for embedded processor execution.

8. The method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring as described in claim 1, characterized in that: The embedded lightweight model includes a network structure after model pruning, weight parameters after parameter quantization, and a computation graph after model structure compression. The number of parameters, model file size, or single inference computation of the embedded lightweight model is less than that of the teacher model. The embedded lightweight model is used to receive dam image data and sensor time-series monitoring data collected by edge monitoring equipment, perform joint inference on the dam image data and the sensor time-series monitoring data, and output the dam operation status category, abnormal risk level, or structural abnormality judgment result.

9. The method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring as described in claim 1, characterized in that: The edge monitoring device includes an embedded processing terminal or an edge computing node. The edge monitoring device communicates with the cloud platform through a monitoring network to receive monitoring data in real time and execute model inference tasks. The edge monitoring device includes a data acquisition module, a data processing module, and a model inference module. The data acquisition module is used to collect dam image data and sensor monitoring data. The data processing module is used to preprocess the monitoring data. The model inference module is used to call an embedded lightweight model to identify the structural state.

10. The method for distilling an embedded lightweight model using a large parameter model for dam safety monitoring as described in claim 1, characterized in that: After the embedded lightweight model is deployed, the edge monitoring device collects dam image data and sensor time-series monitoring data in real time, and calls the embedded lightweight model to perform online inference analysis. When the embedded lightweight model outputs an anomaly judgment result of the dam structure or the anomaly risk level reaches the preset warning condition, the edge monitoring device generates corresponding warning information. At the same time, the edge monitoring device uploads the monitoring data, inference results and anomaly samples to the cloud platform. The cloud platform updates the teacher model or distills the training sample set based on the uploaded data, and distributes the regenerated embedded lightweight model to the edge monitoring device.