A federated semi-supervised object detection method and device for water area ecological monitoring
By employing a multi-teacher-student structure and information entropy-weighted fusion pseudo-labeling technology, and combining target consistency loss and supervision loss to optimize the student detection model, the problems of data privacy and distribution differences in aquatic ecological monitoring are solved, achieving high-precision and robust underwater target detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NINGBO DIGITAL TWIN (EASTERN UNIV OF TECH) RES INST
- Filing Date
- 2026-05-20
- Publication Date
- 2026-06-19
AI Technical Summary
In aquatic ecological monitoring, existing federated learning and semi-supervised learning methods are insufficient to effectively address issues such as data privacy protection, data distribution differences, and pseudo-label quality in multi-source underwater target detection, leading to unstable model training and insufficient detection accuracy.
A multi-teacher, one-student structure is adopted. Pseudo-labels are generated by weighted fusion through information entropy calculation. The student detection model is optimized by combining target consistency loss and supervision loss. The model parameters are weighted and aggregated by using unlabeled and labeled samples for joint training.
It improves the accuracy and generalization ability of target detection in complex underwater environments, enhances the robustness and stability of the model, and can effectively utilize scarce labeled data for high-precision detection.
Smart Images

Figure CN122244598A_ABST
Abstract
Description
Technical Field
[0001] The embodiments in this specification pertain to the field of underwater environmental monitoring, and specifically relate to a federal semi-supervised target detection method and device for aquatic ecological monitoring. Background Technology
[0002] In aquatic ecological monitoring, the detection and identification of underwater targets such as individual fish is a crucial means of assessing the status of aquatic ecosystems. Traditional manual observation or sampling analysis methods are time-consuming and labor-intensive, and the detection results are subject to significant uncertainty in complex underwater environments such as light attenuation, water turbidity, and biological obstruction, making it difficult to meet the real-time and accuracy requirements of large-scale continuous monitoring. In recent years, deep learning-based target detection methods (such as Faster R-CNN and the YOLO series) have demonstrated strong performance in underwater image recognition by introducing multi-scale mechanisms such as feature pyramid networks.
[0003] In practical applications, monitoring data is often held by different water nodes or institutions, and cannot be centrally shared due to data privacy and compliance constraints, resulting in "data silos." At the same time, high-quality labeling of underwater targets is highly dependent on scarce expert resources, resulting in very few labeled samples, while a large amount of unlabeled data is difficult to utilize effectively.
[0004] To address both data privacy and label sparsity issues, the combination of federated learning and semi-supervised learning has gained attention. However, existing research on federated and semi-supervised learning largely focuses on single frameworks, and directly combining them for underwater target detection remains challenging. Due to differences in aquatic environment, lighting conditions, and species composition, local data from different monitoring nodes exhibits significant non-independent and identically distributed characteristics. This results in pseudo-labels generated by each node using local data containing substantial noise and exhibiting severe class distribution shifts. These low-quality pseudo-labels repeatedly propagate and accumulate among nodes during model aggregation, easily leading to global model training divergence or catastrophic forgetting, severely negating the benefits of semi-supervised learning. Given that most existing federated and semi-supervised schemes assume similar data distributions at each node or high pseudo-label reliability, they struggle to directly adapt to the complex data characteristics of underwater environments. Therefore, effectively addressing the differences in multi-source data distribution while protecting data privacy, and simultaneously improving pseudo-label quality and model training stability, is a pressing issue for current aquatic ecological target detection. Summary of the Invention
[0005] The embodiments of this disclosure provide a federal semi-supervised target detection method and apparatus for aquatic ecological monitoring, which aims to solve one or more of the above-mentioned problems and other potential problems.
[0006] According to a first aspect of this disclosure, a federated semi-supervised target detection method for aquatic ecological monitoring is provided, applied to a client in a federated semi-supervised learning framework. The federated semi-supervised learning framework includes a server and multiple clients, each client locally storing unlabeled and labeled samples corresponding to underwater targets. The method includes:
[0007] Construct a student detection model and multiple teacher detection models, with the student detection model and teacher detection models having the same target detection network structure;
[0008] Based on unlabeled samples, each teacher detection model outputs prediction results, and the information entropy corresponding to each prediction result is mapped to the weight of each teacher detection model. The fused pseudo-label is obtained by weighted summation of each prediction result based on the weight. Information entropy is inversely proportional to weight.
[0009] Construct a target consistency loss for unlabeled samples. The target consistency loss is used to constrain the consistency between the output of the student detection model based on unlabeled samples and the fused pseudo-labels.
[0010] A total loss function is constructed based on the target consistency loss and the supervised loss corresponding to labeled samples, and the student detection model is trained according to the total loss function.
[0011] After the student detection model is trained, the model parameters of the student detection model are uploaded to the server, and the student detection model is updated based on the global model parameters returned by the server, so as to detect underwater targets according to the updated student detection model; the server is used to perform weighted aggregation of the model parameters uploaded by each client to obtain the global model parameters.
[0012] According to a second aspect of this disclosure, a federated semi-supervised target detection device for aquatic ecological monitoring is provided, applied to a client in a federated semi-supervised learning framework. The federated semi-supervised learning framework includes a server and multiple clients, each client locally storing unlabeled and labeled samples corresponding to underwater targets. The device includes:
[0013] The model building module is configured to build a student detection model and multiple teacher detection models, with the student detection model and teacher detection models having the same object detection network structure.
[0014] The pseudo-label generation module is configured to generate prediction results based on unlabeled samples, with each teacher detection model outputting its prediction results. The information entropy corresponding to each prediction result is mapped to the weight of each teacher detection model, and the fused pseudo-label is obtained by weighted summation of each prediction result based on the weight. The information entropy is inversely proportional to the weight.
[0015] The consistency loss construction module is configured to construct the target consistency loss for unlabeled samples. The target consistency loss is used to constrain the consistency between the student detection model's output based on unlabeled samples and the fused pseudo-labels.
[0016] The total loss construction module is configured to construct a total loss function based on the target consistency loss and the supervised loss corresponding to the labeled samples, so as to train the student detection model according to the total loss function;
[0017] The parameter update module is configured to upload the model parameters of the student detection model to the server after the student detection model has been trained, and update the student detection model based on the global model parameters returned by the server, so as to detect underwater targets according to the updated student detection model; the server is used to perform weighted aggregation of the model parameters uploaded by each client to obtain the global model parameters.
[0018] According to a third aspect of this disclosure, an electronic device is provided, including one or more processors and a memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform a method provided according to a first scheme.
[0019] According to a fourth aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the method provided according to the first aspect.
[0020] The scheme provided in the embodiments of this specification can utilize a federated semi-supervised learning framework. Within each client, it employs a multi-teacher, one-student structure consisting of multiple teacher detection models with different momentum coefficients and a single student detection model. Based on the uncertainty of information entropy calculation, the teacher outputs are weighted and fused to generate pseudo-labels, effectively suppressing pseudo-label noise caused by complex environments such as underwater light attenuation, turbidity, and target occlusion. Furthermore, by jointly optimizing the student detection model using target consistency loss from unlabeled samples and supervision loss from labeled samples, and then updating global parameters, the scheme can maintain stable convergence of collaborative training even under conditions of extremely sparse underwater ecological monitoring labels and significant differences in data distribution across nodes. Ultimately, this yields a high-precision, high-generalization target detection model for complex underwater environments. Attached Figure Description
[0021] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. In the drawings, the same or similar reference numerals denote the same or similar elements, wherein:
[0022] Figure 1 A flowchart illustrating a federal semi-supervised target detection method for aquatic ecological monitoring, based on some embodiments of this disclosure, is shown.
[0023] Figure 2 A schematic diagram illustrating the principle of a federal semi-supervised target detection method for aquatic ecological monitoring, based on some embodiments of this disclosure, is shown.
[0024] Figure 3 A schematic diagram of the structure of a federal semi-supervised target detection device for aquatic ecological monitoring according to some embodiments of the present disclosure is shown;
[0025] Figure 4 A schematic block diagram of an electronic device according to some embodiments of the present disclosure is shown. Detailed Implementation
[0026] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.
[0027] The terms “comprising” and “having”, and any variations thereof, in this specification, claims, and the foregoing drawings are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the steps or units listed, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to such process, method, product, or apparatus. Depending on the context, the word “if” as it applies herein may be interpreted as “when”, “in response to determination”, or “in response to detection”.
[0028] Figure 1 A flowchart illustrating a federated semi-supervised target detection method 100 for aquatic ecological monitoring, representing some embodiments of this disclosure, is shown. Method 100 can, for example, be executed by a client within a federated semi-supervised learning framework. The federated semi-supervised learning framework can be a server-side... and multiple clients The system consists of a single client, each corresponding to an independent underwater ecological monitoring node, observation platform, or data acquisition terminal. Clients can be, but are not limited to, mobile phones, tablets, desktop computers, servers, etc., while the server can be a central server. Each client constructs its own local dataset based on the local data it collects. The local dataset for each client is The local dataset consists of a labeled sample set and an unlabeled sample set, that is:
[0029] ,
[0030] in, This indicates that there is a labeled sample set. For underwater image data, To detect the annotation information corresponding to the target, This represents the unlabeled sample set.
[0031] Each client will train the model locally using its local dataset and interact with the server by exchanging parameters, without transmitting the original image data used for training. This protects data privacy while enabling multi-node collaborative modeling, thus solving the problem of data sharing between different underwater monitoring nodes and providing a foundation for subsequent semi-supervised training using unlabeled samples.
[0032] like Figure 1 As shown, in method 100, step 102 can construct a student detection model and multiple teacher detection models, with the student detection model and teacher detection models having the same target detection network structure.
[0033] In this embodiment, a student detection model and a student detection model will be built within each client. The teacher detection model and the student detection model can be represented as follows: The set of teacher detection models can be represented as The teacher detection model and the student detection model use the same target detection network structure, and the detection models within each client can also use the same target detection network structure. This target detection network structure can be, for example, a two-stage detection network structure based on a feature pyramid network, to simultaneously acquire underwater target features at different scales. Labeled samples can be used to allow the student detection model to learn the location and category information of the detected target under supervised loss, while unlabeled samples can be used to allow multiple teacher detection models to output prediction results to construct pseudo-labels and guide the training of the student detection model. Through this multi-teacher-student structure, the source of pseudo-labels no longer depends on a single teacher detection model, thereby reducing the adverse impact of single-model bias on the quality of pseudo-labels.
[0034] In method 100, step 104 can be based on unlabeled samples, with each teacher detection model outputting prediction results to map the information entropy corresponding to each prediction result to the weight of each teacher detection model, and obtain the fused pseudo-label by weighted summation of each prediction result based on the weights, where information entropy is inversely proportional to weight.
[0035] In this embodiment, as Figure 2 As shown, for any unlabeled sample It can be input into the build process simultaneously. Within each teacher detection model, the prediction results output by each teacher detection model are denoted as:
[0036] ,
[0037] To reduce the interference of noise and pseudo-labels on model training, a weighted fusion method based on prediction uncertainty is used to aggregate the outputs of multiple teacher detection models. The uncertainty of the prediction results of each teacher detection model is measured by calculating information entropy. A higher information entropy value indicates that the predictions of the teacher detection model for the current sample are more dispersed and have lower reliability; conversely, a lower information entropy value indicates that the predictions are more concentrated and have higher reliability. Based on the information entropy measurement results, the information entropy corresponding to each teacher detection model can be converted into their respective weights through a preset mapping relationship. This mapping relationship has the characteristic of assigning high weights to low entropy values and low weights to high entropy values, thereby achieving the effect of giving greater contribution to the outputs of high-confidence teachers and suppressing the outputs of low-confidence teachers. The specific mapping relationship expression can be set according to needs while satisfying the above characteristics. Finally, the prediction results of each teacher model are weighted and fused according to the determined weight coefficients to generate the fused pseudo-labels for the unlabeled samples as follows:
[0038] ,
[0039] in, For the first The weights of each teacher detection model.
[0040] Through the above settings, the prediction uncertainty of multiple teacher detection models can be dynamically evaluated and softly aggregated, so that the high-confidence teacher detection model contributes more to the pseudo-label, thereby improving the reliability of the pseudo-label and reducing the pseudo-label error caused by factors such as changes in lighting, water turbidity, and target occlusion in complex underwater scenes. This effectively improves the reliability and robustness of the fused pseudo-label in complex underwater environments.
[0041] In method 100, step 106 can construct the target consistency loss corresponding to the unlabeled samples. The target consistency loss is used to constrain the consistency between the output of the student detection model based on the unlabeled samples and the fused pseudo-labels.
[0042] In this embodiment, a target consistency loss is constructed on unlabeled samples to constrain the consistency between the student detection model's output based on unlabeled samples and the fused pseudo-label. This target consistency loss can constrain consistency from at least one of the following dimensions: First, based on the difference in the overlap between the student output and the fused pseudo-label in the predicted bounding boxes, a consistency constraint on bounding box overlap is constructed, enabling the student detection model to more accurately learn the localization information of the detected target. Second, based on the difference in the spatial position of the center point of the predicted box between the student output and the fused pseudo-label, a consistency constraint on the center point position is constructed, enhancing the model's spatial alignment and target matching capabilities in densely occluded scenes through a bidirectional distance metric. Third, based on the center point matching, a consistency constraint on the class probability is constructed according to the difference in the class probability distribution between the student output and the fused pseudo-label in the matching box, ensuring that the two models maintain consistent judgments for the same class of target. Through these multiple dimensions of consistency constraints, the student detection model can be guided to output detection results on unlabeled samples that are consistent with the fused pseudo-label.
[0043] In method 100, step 108 can construct a total loss function based on the target consistency loss and the supervised loss corresponding to the labeled samples, so as to train the student detection model according to the total loss function.
[0044] In this embodiment, a total loss function is constructed such that supervised training of the student detection model is performed using supervised loss on labeled samples, while semi-supervised training is performed using target consistency loss on unlabeled samples. This constitutes the overall optimization objective of the student model. To enable semi-supervised training, the training with unlabeled samples actually includes both unlabeled and labeled samples, utilizing labeled samples to provide additional stability correction for consistency constraints. The total loss function can be expressed as:
[0045] ,
[0046] in, For the loss of goal consistency, To monitor losses, For a labeled sample set, This is an unlabeled sample set.
[0047] By using the above joint optimization method, the student detection model can simultaneously learn precise supervision information from labeled samples and soft supervision information provided by the teacher detection model from unlabeled samples, thereby improving the overall detection accuracy and model generalization ability.
[0048] In method 100, step 110 can upload the model parameters of the student detection model to the server after the student detection model is trained, and update the student detection model based on the global model parameters returned by the server, so as to detect underwater targets according to the updated student detection model; the server is used to perform weighted aggregation of the model parameters uploaded by each client to obtain the global model parameters.
[0049] In this embodiment, after each client completes local training, it uploads the model parameters of the student detection model to the server. The server then performs weighted aggregation based on the sample size of each client to obtain the global model parameters. Let the first... The global model parameters in round communication are , No. The increment of model parameters uploaded by each client is Then the global model update formula is:
[0050] ,
[0051] in, Indicates the first The weighted aggregation method considers the proportion of model parameter samples from each client to the total sample size. This allows clients with larger sample sizes to contribute more fully to the global model, while also integrating data characteristics from different waters, collection devices, and ecological environments to improve the adaptability of the global model in cross-scenario underwater fish disease detection.
[0052] In summary, this application integrates a multi-teacher guidance mechanism, a multi-teacher update strategy, and multiple consistency loss designs within a federated semi-supervised framework. By providing complementary supervision information through a multi-teacher model and utilizing an uncertainty-aware aggregation strategy to improve pseudo-label quality, combined with multi-dimensional consistency constraints designed for dense target detection tasks, the student model can more fully utilize unlabeled underwater image data for training. This effectively solves the problems in existing technologies such as insufficient labeled data, unreliable pseudo-labels from a single teacher, difficulty in target alignment, and training instability caused by inconsistent data distribution across nodes. As a result, it significantly improves the detection accuracy and model robustness in underwater ecological environment target detection tasks.
[0053] In one possible implementation, the information entropy corresponding to each prediction result is mapped to the weights of each teacher detection model, including:
[0054] For any prediction result output by a teacher detection model, calculate the information entropy corresponding to the prediction result, and determine the weight of the teacher detection model based on the softmax normalization value of the negative information entropy.
[0055] In this embodiment, the first The formula for calculating the uncertainty of the prediction results of a teacher detection model, measured by information entropy, can be:
[0056] ,
[0057] in, Indicates the first A teacher detection model for categories The predicted probability, This indicates the total number of target categories.
[0058] Based on the aforementioned uncertainties, the first The weights of each teacher detection model are:
[0059] ,
[0060] In one possible implementation, the method further includes:
[0061] For any teacher detection model, the parameters of each teacher detection model are updated by exponential moving average based on the current parameters of the student detection model. Different teacher detection models use different momentum coefficients when updating by exponential moving average.
[0062] In this embodiment, the parameters of the teacher detection model are updated using an exponential moving average update strategy based on the student model parameters. Let the... The teacher detection model in the first... The parameters for the next local iteration are: The student detection model in the first The parameters for the next local iteration are: Then the first The formula for updating the parameters of the teacher detection model is as follows:
[0063] ,
[0064] in, For the first The momentum coefficient corresponding to the teacher detection model.
[0065] Different teacher detection models employ different momentum coefficients to vary their sensitivity to changes in student detection model parameters. Teacher detection models with smaller momentum coefficients update more slowly, maintaining smoother and more stable historical knowledge; models with larger momentum coefficients update more quickly, reflecting the current learning state of the student detection model more promptly. By setting multiple teacher detection models with different update rates, parameter differences can be maintained among them during training, resulting in complementary predictions and preventing multiple teacher detection models from converging to identical representation states.
[0066] In one possible implementation, the target consistency loss corresponding to unlabeled samples is constructed, including:
[0067] The cross-union ratio consistency loss, center consistency loss, and class probability consistency loss are calculated when the student detection model and the teacher detection model are trained on unlabeled samples. The cross-union ratio consistency loss is used to constrain the consistency of the degree of overlap of the bounding boxes, the center consistency loss is used to constrain the consistency of the spatial location of the target, and the class probability consistency loss is used to constrain the consistency of the judgment of the same class of target.
[0068] The target consistency loss is obtained by weighting the intersection-union consistency loss, the center consistency loss, and the class probability consistency loss.
[0069] In this embodiment, to improve the consistency between the outputs of the teacher detection model and the student detection model in dense target scenarios, multiple consistency losses (including intersection-union consistency loss, center consistency loss, and class probability consistency loss) will be constructed on unlabeled samples, and the target consistency loss will be constructed by weighting them.
[0070] The formula for calculating the consistency loss of the intersection-union ratio is:
[0071] ,
[0072] in, For the number of predicted boxes, The output of the teacher detection model One prediction box, Output set for student detection model Zhongyu The prediction box with the largest intersection-union ratio. express and The intersection-union ratio between them.
[0073] It can be represented as:
[0074] ,
[0075] in, express and The intersection-union ratio between them.
[0076] The intersection-union ratio consistency loss is used to constrain the consistency between the teacher detection model and the student detection model in terms of the degree of overlap of the bounding boxes, so that the student detection model can learn the localization information of the detected target more accurately.
[0077] For each element in the center set of the teacher detection model's predicted bounding boxes, search for the center point with the smallest Euclidean distance in the center set of the student detection model's predicted bounding boxes to obtain the matching center set. The corresponding formula for calculating the center consistency loss is:
[0078] ,
[0079] in, This is the set of predicted bounding box centers for the student detection model. This is the set of predicted bounding box centers for the teacher detection model. for The center of the first predicted bounding box that matches the center of the predicted bounding box in the student detection model. for The center of the second predicted box that matches the center of the predicted box in the teacher detection model. The center of a predicted bounding box in the student detection model. This is the center of a prediction box in the teacher detection model.
[0080] By introducing center consistency loss, the consistency between the teacher detection model and the student detection model in the target space location can be further constrained, thereby enhancing the target matching ability in densely occluded scenes.
[0081] The class probability consistency loss is represented by the Kullback-Leibler divergence, and the calculation formula is as follows:
[0082] ,
[0083] in, To detect the probability of the predicted bounding box category in the teacher detection model. To detect the predicted bounding box category probability of the student detection model, Indicated by For true distribution, The KL divergence is used to approximate the distribution and measure the difference between two probability distributions. The class probability consistency loss is used to ensure that the teacher detection model and the student detection model maintain consistency in their judgments of the same class of target.
[0084] Combining the intersection-union consistency loss, centrality consistency loss, and class probability consistency loss mentioned above, the target consistency loss can be obtained as follows:
[0085] ,
[0086] in, The weighting coefficients for each loss term can be preset. By constraining the consistency between the teacher's and student's detection model outputs from three aspects—geometric location, center point, and class probability—the effectiveness of the student's detection model in learning from unlabeled samples can be improved.
[0087] In one possible implementation, after updating the student detection model based on the global model parameters returned by the server, the method further includes:
[0088] The updated student detection model will undergo another round of federated communication training.
[0089] In this embodiment, after updating the student detection model and the teacher detection model according to the global model parameters, another round of federated communication training will be carried out to enable the updated model to regenerate new fused pseudo-labels, and then train again based on the new fused pseudo-labels. This achieves a co-evolutionary process of generating more accurate pseudo-labels with a better model and then training a better model with more accurate pseudo-labels.
[0090] Furthermore, to verify the effectiveness of the proposed method in underwater fish disease detection, experiments were conducted using a publicly available fish disease dataset. This dataset contains seven categories of fish conditions: bacterial diseases, bacterial gill disease, bacterial erythroplasm disease, fungal diseases, parasitic diseases, viral diseases, and healthy fish. Each category contains 250 images, totaling 1750 images. To reflect the limited labeled data in practical applications, the data was divided into labeled and unlabeled data. Only a subset of samples were selected as labeled data, while the remainder were used as unlabeled data for semi-supervised training.
[0091] In a federated learning environment, data is distributed across multiple clients according to a Dirichlet distribution to construct a non-independent, identically distributed (Non-IID) data scenario. The system is configured with five clients, each training its model based on local data, and the model parameters are aggregated and updated on the server side. Four teacher models are used, updated via exponential moving averages with different momentum coefficients. Model training employs a joint loss function, including supervised loss and multidimensional consistency loss.
[0092] As shown in Table 1, the proposed method was compared with various federated learning methods (such as FedAvg and FedProx) and federated semi-supervised methods (such as FL+FixMatch and FL+Mean Teacher). Under different annotation ratios (10%, 20%, 50%, and 100%), the proposed method achieved better performance in terms of mAP and... The proposed method achieves optimal results across all metrics. Furthermore, the performance improvement is more significant under low labeling ratios, demonstrating that the proposed federated semi-supervised learning framework can effectively utilize unlabeled data. While the performance of all methods improves with increasing labeling ratios, the proposed method maintains a stable advantage.
[0093] Table 1 Comparison of fish disease detection results under different labeling ratios
[0094]
[0095] Based on the results in Table 2, under the condition of 20% labeled data, the method of this application has different Non-IID distributions (different The model exhibits a stable convergence trend across all values, with performance gradually improving and eventually stabilizing as the number of communication rounds increases. The decrease in size leads to greater differences in data distribution among clients, resulting in a slight decrease in model performance, but overall it remains at a high level. Specifically, under strong Non-IID conditions (…), Under the condition of ), the model's final mAP value reached 64.9%, and under the stronger Non-IID condition ( Although the model performance further declined under the above conditions, it still achieved stable convergence, with the mAP value decreasing by only 3.3%, indicating that the method in this application still has good robustness in federated learning environments with large differences in data distribution.
[0096] Table 2 Detection results under different data distribution conditions
[0097]
[0098] According to the results in Table 3, it can be seen that different momentum coefficient settings have a certain impact on model performance under the conditions of 20%, 50%, and 70% labeled data. Overall, the model performance is better than the random allocation strategy when the momentum coefficient is allocated in an increasing manner, indicating that different teacher models can provide more effective supervision information by using an ordered update rate.
[0099] Furthermore, under different annotation ratios, when The model performance was optimal when the momentum coefficient was set to 0.015. As the momentum coefficient increased or decreased further, the model performance decreased slightly, indicating that the teacher model update speed needs to strike a balance between stability and adaptability.
[0100] Table 3. Detection results under different momentum coefficient setting strategies
[0101]
[0102] In summary, Table 3 verifies the effectiveness of the differentiated momentum allocation strategy in the multi-teacher framework, which can improve the quality of pseudo-labels and enhance the stability of model training.
[0103] Figure 3 This document illustrates a schematic diagram of a federal semi-supervised target detection device 300 for aquatic ecological monitoring, based on some embodiments of this disclosure. The various embodiments in this specification are described in a progressive manner; similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on its differences from other embodiments. In particular, the device embodiments are substantially similar to the method embodiments, and therefore the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments. Figure 3 As shown, the device 300 includes a model building module 301, which is configured to build a student detection model and multiple teacher detection models, wherein the student detection model and the teacher detection model have the same target detection network structure;
[0104] The pseudo-label generation module 302 is configured to generate prediction results based on unlabeled samples, with each teacher detection model outputting prediction results. The information entropy corresponding to each prediction result is mapped to the weight of each teacher detection model, and the fused pseudo-label is obtained by weighted summation of each prediction result based on the weight. The information entropy is inversely proportional to the weight.
[0105] The consistency loss construction module 303 is configured to construct the target consistency loss corresponding to the unlabeled samples. The target consistency loss is used to constrain the consistency between the output of the student detection model based on the unlabeled samples and the fused pseudo-labels.
[0106] The total loss construction module 304 is configured to construct a total loss function based on the target consistency loss and the supervised loss corresponding to the labeled samples, so as to train the student detection model according to the total loss function;
[0107] The parameter update module 305 is configured to upload the model parameters of the student detection model to the server after the student detection model has been trained, and update the student detection model based on the global model parameters returned by the server, so as to detect underwater targets according to the updated student detection model; the server is used to perform weighted aggregation of the model parameters uploaded by each client to obtain the global model parameters.
[0108] In one possible implementation, the pseudo-label generation module 302 is further configured to calculate the information entropy corresponding to the prediction result output by any teacher detection model, and determine the weight of the teacher detection model based on the softmax normalization value of the negative information entropy.
[0109] In one possible implementation, the device further includes a teacher model update module configured to update the parameters of each teacher detection model by an exponential moving average based on the current parameters of the student detection model for any teacher detection model, wherein different teacher detection models use different momentum coefficients during the exponential moving average update.
[0110] In one possible implementation, the consistency loss construction module 303 is further configured to calculate the intersection-union consistency loss, center consistency loss, and class probability consistency loss when the student detection model and the teacher detection model are trained based on unlabeled samples. The intersection-union consistency loss is used to constrain the consistency of the degree of overlap of the bounding boxes, the center consistency loss is used to constrain the consistency of the spatial location of the target, and the class probability consistency loss is used to constrain the consistency of the judgment of the same class of target. The intersection-union consistency loss, center consistency loss, and class probability consistency loss are weighted to obtain the target consistency loss.
[0111] In one possible implementation, the formula for calculating the crossover-union consistency loss is:
[0112]
[0113] in, For the number of predicted boxes, The output of the teacher detection model One prediction box, Output set for student detection model Zhongyu The prediction box with the largest intersection-union ratio. express and The crossover ratio between them;
[0114] The formula for calculating central consistency loss is:
[0115]
[0116] in, This is the set of predicted bounding box centers for the student detection model. This is the set of predicted bounding box centers for the teacher detection model. for The center of the first predicted bounding box that matches the center of the predicted bounding box in the student detection model. for The center of the second predicted box that matches the center of the predicted box in the teacher detection model. The center of a predicted bounding box in the student detection model. The center of a predicted bounding box in the teacher detection model;
[0117] The formula for calculating the class probability consistency loss is:
[0118]
[0119] in, To detect the probability of the predicted bounding box category in the teacher detection model. To detect the predicted bounding box category probability of the student detection model, Indicated by For true distribution, KL divergence is an approximate distribution used to measure the difference between two probability distributions.
[0120] In one possible implementation, the formula for calculating the total loss function is:
[0121]
[0122] in, For the loss of goal consistency, To monitor losses, For a labeled sample set, This is an unlabeled sample set.
[0123] In one possible implementation, the parameter update module 305 is also configured to perform the next round of federated communication training on the updated student detection model.
[0124] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of this specification is generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in or transmitted through a computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., Digital Versatile Discs (DVDs)), or semiconductor media (e.g., Solid State Disks (SSDs)).
[0125] Figure 4 A block diagram of an electronic device 400 that can implement various embodiments of the present disclosure is shown. For example... Figure 4 As shown, the electronic device 400 includes a processor 410, a disk drive 420, an input / output interface 430, a network interface 440, and a memory 450. The processor 410, disk drive 420, input / output interface 430, network interface 440, and memory 450 can communicate with each other via a communication bus 460.
[0126] The processor 410 can be implemented using a general-purpose CPU, microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits to execute relevant programs and implement the technical solution provided in this application.
[0127] The memory 450 can be implemented in the form of ROM (Read Only Memory), RAM (Read Access Memory), static memory, dynamic storage devices, etc. The memory 450 can store the operating system 451 used to control the operation of the electronic device 400, and the basic input / output system (BIOS) 452 used to control the low-level operations of the electronic device 400. Additionally, it can store a web browser 453, a data storage management system 454, etc. In summary, when the technical solution provided in this application is implemented through software or firmware, the relevant program code is stored in the memory 450 and is called and executed by the processor 410.
[0128] Input / output interface 430 is used to connect input / output modules to realize information input and output. Input / output modules can be configured as components in the device (not shown in the figure) or externally connected to the device to provide corresponding functions. Input devices may include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices may include displays, vibrators, indicator lights, etc.
[0129] Network interface 440 is used to connect a communication module (not shown in the figure) to enable communication and interaction between the device and other devices. The communication module can communicate via wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
[0130] Bus 460 includes a pathway for transmitting information between various components of the device, such as processor 410, disk drive 420, input / output interface 430, network interface 440, and memory 450.
[0131] It should be noted that although the above-described device only shows the processor 410, disk drive 420, input / output interface 430, network interface 440, memory 450, bus 460, etc., in specific implementations, the device may also include other components necessary for normal operation. Furthermore, those skilled in the art will understand that the above-described device may only include the components necessary for implementing the method of this application, and does not necessarily include all the components shown in the figures.
[0132] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0133] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. Furthermore, although operations are depicted in a specific order, this should be understood as requiring that such operations be performed in the specific order shown or in sequential order, or requiring that all illustrated operations be performed to achieve the desired result. In certain environments, multitasking and parallel processing may be advantageous. Similarly, while several specific implementation details are included in the foregoing discussion, these should not be construed as limiting the scope of this disclosure. Certain features described in the context of individual embodiments may also be implemented in combination in a single implementation. Conversely, various features described in the context of a single implementation may also be implemented individually or in any suitable sub-combination in multiple implementations.
[0134] Although the subject matter has been described using language specific to structural features and / or methodological logic, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or actions described above. Rather, the specific features and actions described above are merely illustrative examples of implementing the claims.
Claims
1. A federally-managed semi-supervised target detection method for aquatic ecological monitoring, characterized in that, A client-side application in a federated semi-supervised learning framework, the federated semi-supervised learning framework comprising a server and multiple clients, each client locally storing unlabeled and labeled samples corresponding to underwater targets, the method comprising: Construct a student detection model and multiple teacher detection models, wherein the student detection model and the teacher detection models have the same target detection network structure; Based on the unlabeled samples, each of the teacher detection models outputs prediction results, so as to map the information entropy corresponding to each prediction result to the weight of each teacher detection model, and obtain the fused pseudo-label by weighted summation of each prediction result based on the weight, wherein the information entropy is inversely proportional to the weight; Construct a target consistency loss corresponding to the unlabeled samples, and use the target consistency loss to constrain the consistency between the output of the student detection model based on the unlabeled samples and the fused pseudo-labels; A total loss function is constructed based on the target consistency loss and the supervised loss corresponding to the labeled samples, and the student detection model is trained according to the total loss function. After the student detection model is trained, the model parameters of the student detection model are uploaded to the server, and the student detection model is updated based on the global model parameters returned by the server, so as to detect underwater targets according to the updated student detection model; the server is used to perform weighted aggregation of the model parameters uploaded by each client to obtain the global model parameters.
2. The federal semi-supervised target detection method for aquatic ecological monitoring according to claim 1, characterized in that, The step of mapping the information entropy corresponding to each prediction result to the weights of each teacher detection model includes: For any of the teacher detection models outputting a prediction result, calculate the information entropy corresponding to the prediction result, and determine the weight of the teacher detection model based on the softmax normalization value of the negative information entropy.
3. The federal semi-supervised target detection method for aquatic ecological monitoring according to claim 1, characterized in that, The method further includes: For any of the teacher detection models, the parameters of each teacher detection model are updated by exponential moving average based on the current parameters of the student detection model. Different teacher detection models use different momentum coefficients during the exponential moving average update.
4. The federal semi-supervised target detection method for aquatic ecological monitoring according to claim 1, characterized in that, The construction of the target consistency loss corresponding to the unlabeled samples includes: The cross-union consistency loss, center consistency loss, and class probability consistency loss are calculated when the student detection model and the teacher detection model are trained based on the unlabeled samples. The cross-union consistency loss is used to constrain the consistency of the degree of overlap of the bounding boxes, the center consistency loss is used to constrain the consistency of the spatial location of the target, and the class probability consistency loss is used to constrain the consistency of the judgment of the same class of target. The target consistency loss is obtained by weighting the intersection-union ratio consistency loss, the center consistency loss, and the class probability consistency loss.
5. A federal semi-supervised target detection method for aquatic ecological monitoring according to claim 4, characterized in that, The formula for calculating the crossover-union consistency loss is as follows: , in, For the number of predicted boxes, The output of the teacher detection model One prediction box, Output set for student detection model Zhongyu The prediction box with the largest intersection-union ratio. express and The crossover ratio between them; The formula for calculating central consistency loss is: , in, This is the set of predicted bounding box centers for the student detection model. This is the set of predicted bounding box centers for the teacher detection model. for The center of the first predicted bounding box that matches the center of the predicted bounding box in the student detection model. for The center of the second predicted box that matches the center of the predicted box in the teacher detection model. The center of a predicted bounding box in the student detection model. The center of a predicted bounding box in the teacher detection model; The formula for calculating the class probability consistency loss is: , in, To detect the probability of the predicted bounding box category in the teacher detection model. To detect the predicted bounding box category probability of the student detection model, Indicated by For true distribution, KL divergence is an approximate distribution used to measure the difference between two probability distributions.
6. A federal semi-supervised target detection method for aquatic ecological monitoring according to claim 1, characterized in that, The formula for calculating the total loss function is as follows: , in, For the loss of goal consistency, To monitor losses, For a labeled sample set, This is an unlabeled sample set.
7. A federal semi-supervised target detection method for aquatic ecological monitoring according to claim 1, characterized in that, After updating the student detection model based on the global model parameters returned by the server, the process further includes: The updated student detection model will undergo another round of federated communication training.
8. A federal semi-supervised target detection device for aquatic ecological monitoring, characterized in that, A client-side application in a federated semi-supervised learning framework, the federated semi-supervised learning framework comprising a server and multiple clients, each client locally storing unlabeled and labeled samples corresponding to underwater targets, the device comprising: The model building module is configured to build a student detection model and multiple teacher detection models, wherein the student detection model and the teacher detection model have the same target detection network structure; The pseudo-label generation module is configured to generate prediction results from each of the teacher detection models based on the unlabeled samples, so as to map the information entropy corresponding to each prediction result to the weight of each teacher detection model, and obtain the fused pseudo-label by weighted summation of each prediction result based on the weight, wherein the information entropy is inversely proportional to the weight. The consistency loss construction module is configured to construct the target consistency loss corresponding to the unlabeled samples. The target consistency loss is used to constrain the consistency between the output of the student detection model based on the unlabeled samples and the fused pseudo-labels. The total loss construction module is configured to construct a total loss function based on the target consistency loss and the supervised loss corresponding to the labeled samples, so as to train the student detection model according to the total loss function; The parameter update module is configured to upload the model parameters of the student detection model to the server after the student detection model is trained, and update the student detection model based on the global model parameters returned by the server, so as to detect underwater targets according to the updated student detection model; the server is used to perform weighted aggregation of the model parameters uploaded by each client to obtain the global model parameters.
9. An electronic device, characterized in that, include: One or more processors, and a memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform the steps of a federal semi-supervised target detection method for aquatic ecological monitoring as described in any one of claims 1-7.
10. A computer program product, characterized in that, Includes a computer program, which, when executed by a processor, implements a federal semi-supervised target detection method for aquatic ecological monitoring according to any one of claims 1-7.