A cloud-edge collaboration method based on multi-source heterogeneous information fusion

CN120766078BActive Publication Date: 2026-06-26EAST CHINA UNIV OF SCI & TECH +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
EAST CHINA UNIV OF SCI & TECH
Filing Date
2025-06-27
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In the context of big data, how to effectively mine information and build an efficient collaborative optimization mechanism in the fusion of multi-source heterogeneous information, and ensure the accuracy and speed of the system in complex environments, especially how to identify conflicts between different edges and build cloud-edge collaborative decision-making methods.

Method used

By employing multimodal image processing and fusion, improved Faster R-CNN network, multi-label training dataset, evidence distance matrix and Dempster rule, and cloud-side reinforcement learning decision model, global optimization is achieved through feature extraction, evidence fusion and reinforcement learning to realize cloud-edge collaboration.

Benefits of technology

It achieves low-conflict, high-reliability cloud-edge collaboration in complex environments, ensuring the accuracy and speed of information fusion, avoiding counterintuitive conclusions, and enhancing the system's ability to adapt to complex environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120766078B_ABST
    Figure CN120766078B_ABST
Patent Text Reader

Abstract

The application provides a cloud edge collaborative method based on multi-source heterogeneous information fusion, comprising image processing, Faster R-CNN network construction, basic belief distribution output, multi-view basic belief distribution fusion through supervised and unsupervised methods, target state feature extraction, reinforcement learning model reasoning update, cloud side classification head migration and edge side model fusion, and edge reasoning update under multi-view image input. The application extracts and efficiently integrates the multi-modal data collected by different sensors, introduces an evidence fusion strategy to ensure the reliability of the evidence, introduces an evidence correction strategy for the high conflict problem that may exist in the multiple edge side fused evidence, combines the conflict identification and dynamic management mechanism to effectively avoid the generation of counter-intuitive conclusions, globally optimizes through reinforcement learning, realizes the dynamic interaction and intelligent feedback between the cloud and the edge, and ensures the low conflict and high reliability of the system in a complex environment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of cloud-edge collaboration technology, and in particular to a cloud-edge collaboration method based on multi-source heterogeneous information fusion. Background Technology

[0002] In the era of big data, the types of data are increasing and the scale of data is growing day by day. The need to extract information with high utilization value from massive and complex data is becoming more and more urgent. Information extracted by using a single data source and method may have certain biases, while information fusion using data from multiple sources can fully explore the inherent characteristics and patterns contained in the information.

[0003] Multi-source heterogeneous information fusion aims to extract hidden, high-value information from numerous data sources and various types of data. However, in the context of big data, the data structure varies greatly, the sources are wide, and the real-time nature is strong. How to extract effective information from multi-dimensional and massive big data, establish reliable conflict management mechanisms and collaborative decision-making methods, and ensure the accuracy and speed of reasoning has become the key to multi-source heterogeneous information fusion research.

[0004] The cloud-edge collaborative decision-making method utilizing multi-source heterogeneous information fusion involves key technologies such as information fusion at the data layer, evidence correction at the feature layer, and conflict management at the decision layer. It requires mining deep features from massive signals from different sources. However, current research still faces several difficulties and challenges: on the one hand, the system's adaptability to complex environments and how to identify conflicts between different edges require in-depth exploration; on the other hand, cloud-edge collaborative decision-making requires the integration of global information and decision optimization, and constructing an efficient collaborative optimization mechanism also presents challenges. Therefore, designing a cloud-edge collaborative method based on multi-source heterogeneous information fusion is essential. Summary of the Invention

[0005] In order to overcome the shortcomings of the existing technology, the purpose of this invention is to provide a cloud-edge collaborative method based on multi-source heterogeneous information fusion.

[0006] To achieve the above objectives, the present invention provides the following solution:

[0007] This invention also provides a cloud-edge collaborative method based on multi-source heterogeneous information fusion, comprising:

[0008] Step 1: Acquire multimodal images, process and fuse them. Decompose the multimodal images into basic parts and detailed content. Use a weighted average strategy to fuse the basic parts and use the VGG-19 network to extract features from the detailed content. Generate the fused detailed content by selecting a strategy, and finally reconstruct the fused image.

[0009] Step 2: Construct the Faster R-CNN network and multi-label training dataset. Replace the single-label classification head of Faster R-CNN with a multi-label Sigmoid layer. Train the network using multi-label binary cross-entropy loss combined with bounding box regression loss and output the multi-label membership probability vector of the target candidate region.

[0010] Step 3: The basic belief assignment from multiple perspectives is fused using supervised and unsupervised methods. The supervised weights are based on the historical accuracy of the sensors. The unsupervised weights are determined by constructing an evidence distance matrix, calculating the distance between each piece of evidence, determining the best and worst BPA, using Deng's relative entropy to measure the best-other vector and other-worst vector, solving the evidence weights through a constrained optimization problem, determining the acceptance threshold through sensitivity analysis, discounting the evidence based on the fusion of the two weights, and generating the final belief assignment using the Dempster rule.

[0011] Step 4: Construct a cloud-based reinforcement learning decision model. Based on the prediction results of the edge model and the cloud model, extract the state features of each target category, construct the state space, and use a deep Q-network to build a reinforcement learning model. Train the model to obtain a transfer decision strategy based on performance gain.

[0012] Step 5: Integrate cloud-side classification head transfer and edge-side model. Based on the transfer action output by the reinforcement learning system, fine-tune the classification head parameters of the selected category Faster R-CNN model to form a transfer patch. Then, replace the patch with the corresponding position of the original model to build a fusion model and complete edge inference update under multi-view image input.

[0013] Preferably, in step 1, the multimodal image is decomposed into a basic part and detailed content. A weighted average strategy is used to fuse the basic part, and the VGG-19 network is used to extract features from the detailed content. A strategy is selected to generate the fused detailed content, and finally, the fused image is reconstructed. Specifically:

[0014] Acquire multimodal images, including visible light images, infrared images, and SAR images;

[0015] For each input image I K Let K represent the visible light image, infrared image, and SAR image. The basic and detailed information is obtained by solving an optimization problem, where the optimization problem is:

[0016]

[0017] In the formula, The basic part, g x =[-1,1]、g y =[-1,1] T For the gradient operator, λ = 5. For details;

[0018] The basic components are fused using a weighted average strategy, as follows:

[0019]

[0020] In the formula, (x,y) are pixel coordinates These are the basic components of visible light images, infrared images, and SAR images, respectively.

[0021] The pre-trained VGG-19 network is used to extract multi-layer features from the details, including ReLU1_1 to ReLU4_1 layers. The activity level map is calculated using the L1 norm, and a weight map is generated by Softmax. After upsampling, the details are weighted and fused. Finally, the fused details F is generated using the maximum selection strategy. d ;

[0022] The resulting fused base and fused detail parts are directly added together to obtain the final fused image, which is:

[0023] F = F b +F d .

[0024] Preferably, in step 2, constructing a multi-label training dataset specifically involves:

[0025] Obtain the label vector Y = [y1, y2, ..., y] for each target instance. L ], where y l ∈{0,1} indicates whether it belongs to the l-th label class, and a multi-label training dataset is constructed.

[0026] Preferably, in step 2, the loss function of the Faster R-CNN network is:

[0027]

[0028] In the formula, L BCE Y is the predicted label probability vector, and Y is the true label vector. B and B are the predicted and actual bounding box coordinates, respectively;

[0029] The Faster R-CNN network outputs the instance number, predicted class, bounding box coordinates, and confidence scores for all classification labels. The detection results from each sensor are converted into a unified format: [instance index, class, bounding box, probability array].

[0030] Preferably, in step 3, the basic belief assignment from multiple perspectives is fused using supervised and unsupervised methods. The supervised method is based on the historical accuracy of the sensors; the unsupervised method constructs an evidence distance matrix, calculates the distance between each piece of evidence, determines the best and worst BPA, uses Deng's relative entropy to measure the best-others and other-worst vectors, solves the evidence weights through a constrained optimization problem, determines the acceptance threshold through sensitivity analysis, discounts the evidence based on the fusion of the two weights, and generates the final belief assignment using the Dempster rule. Specifically:

[0031] Construct the Jousselme evidence distance matrix based on multi-label membership probability vectors;

[0032] Calculate the distance CD(m) between each piece of evidence. j ), select the one with the largest CD and assign m to the worst belief. W , and m W The one furthest away is assigned the best belief m. B ;

[0033] The best-others and other-worst vectors are calculated based on Deng's relative entropy, and the evidence weights ω are solved through a constrained optimization problem. j The acceptance threshold ξ was determined through sensitivity analysis. max If ξ≤ξ max If the weights are accepted, the comparison matrix is ​​readjusted.

[0034] Supervised weights are determined based on the historical accuracy of sensors, and the two weights are then weighted and fused.

[0035] The beliefs are assigned based on the weighted discounts after fusion, and then fused using the Dempster rule to obtain the final belief assignment.

[0036] Preferably, in step 4, a cloud-side reinforcement learning decision model is constructed. Based on the prediction results of the edge model and the cloud model, the state features of each target category are extracted, a state space is constructed, and a deep Q-network is used to establish a reinforcement learning model. The model is then trained to obtain a transfer decision strategy based on performance gains. Specifically:

[0037] Based on the prediction results of the edge model and the cloud model, a state feature vector is extracted for each target category. The state feature vector includes the accuracy (Acc) of the edge category. edge Cloud-based category accuracy (ACC) cloud And the difference between the two, Δ = A cloud -A edge The state vector S = [A] is formed. edge A cloud ,Δ];

[0038] Construct a reinforcement learning agent with a deep Q-network as its core, and define an action set A = {0, 1}, where 0 represents skipping the transfer and 1 represents triggering the current class transfer operation;

[0039] A reward mechanism is set up so that, after the migration is executed, the accuracy change δ = A for this category based on the edge model is calculated. cloud -A edge Calculate the reward R, i.e.:

[0040]

[0041] In the formula, θ is the minimum precision gain threshold, δ' is the accuracy of the previous effective transfer, and A cloud For the current model's cloud-side accuracy, A edge This represents the side accuracy of the current model.

[0042] The policy is generated and iterated through the Q-function, outputting a transition policy π(s) = argmax based on the state S. a Q(s,a) is used to indicate whether to perform a migration on the target category.

[0043] Preferably, in step 5, the cloud-side classification head transfer and the edge-side model are fused. Based on the transfer action output by the reinforcement learning system, the classification head parameters of the selected category Faster R-CNN model are fine-tuned to form a transfer patch. Subsequently, the patch is replaced at the corresponding position in the original model to construct a fused model. Edge inference updates are then completed under multi-view image input. Specifically:

[0044] Based on the output of the generated transfer strategy, select the category with action 1 and trigger the transfer training of the corresponding classification head;

[0045] After each transfer learning iteration, the accuracy of this category on both the cloud model and the edge model is calculated and denoted as A. cloud A edge And calculate the accuracy difference δ = A cloud -A edge If δ≤θ, then the transfer is considered to be fully completed, the transfer training patch for that class is saved, and the transfer operation for that class will not be triggered again in subsequent training cycles;

[0046] During the transfer training process, all parameters in the Faster R-CNN network except for the classifier head corresponding to the current target class are frozen. Backpropagation and parameter optimization are performed only on the weight vector head.score.weight and the bias term head.score.bias corresponding to the class. After the transfer is completed, the patch parameters obtained by training are stored as a dedicated lightweight update module for the class and replaced and written to the corresponding position of the edge model to complete the local update and model fusion deployment.

[0047] Once all target categories have completed the transfer and their cloud-edge accuracy difference δ≤θ, the entire training process will automatically terminate.

[0048] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects:

[0049] This invention provides a cloud-edge collaborative method based on multi-source heterogeneous information fusion. The method includes acquiring multimodal images, processing and fusing them, decomposing the multimodal images into basic components and detailed content, fusing the basic components using a weighted average strategy, extracting features from the detailed content using a VGG-19 network, generating the fused detailed content through a selected strategy, and finally reconstructing the fused image. A Faster R-CNN network and a multi-label training dataset are constructed, replacing the single-label classification head of the Faster R-CNN with a multi-label Sigmoid layer, training the network using multi-label binary cross-entropy loss combined with bounding box regression loss, outputting a multi-label membership probability vector of the target candidate region, and fusing multi-view basic belief assignments through supervised and unsupervised methods. The supervised weights are based on sensor history. Historical accuracy; Unsupervised weighting is achieved by constructing an evidence distance matrix, calculating the distance between each piece of evidence, determining the best and worst BPA, using Deng relative entropy to measure the best-others and other-worst vectors, solving the evidence weights through constrained optimization, determining the acceptance threshold through sensitivity analysis, fusing the two weights, discounting the evidence based on the fused weights, and generating the final belief assignment through Dempster's rule. A cloud-side reinforcement learning decision model is constructed. Based on the prediction results of the edge model and the cloud model, the state features of each target category are extracted, a state space is constructed, and a deep Q-network is used to build a reinforcement learning model. The model is trained to obtain a transfer decision strategy based on performance gain. The cloud-side classification head transfer and the edge model are fused. Based on the transfer actions output by the reinforcement learning system, the classification head parameters of the selected category Faster R-CNN model are fine-tuned to form a transfer patch. The patch is then replaced in the corresponding position of the original model to construct a fused model, and edge inference updates are completed under multi-view image input. In the information fusion stage, this invention performs feature extraction and efficient integration of multimodal data collected by different sensors, introduces an evidence fusion strategy to ensure the reliability of the evidence, introduces an evidence correction strategy to address the potential high conflict issues in evidence fused from multiple edges, and combines conflict identification and dynamic management mechanisms to effectively avoid counterintuitive conclusions. Finally, global optimization is performed through reinforcement learning to achieve dynamic interaction and intelligent feedback between the cloud and the edge, ensuring low conflict and high reliability of the system in complex environments. Attached Figure Description

[0050] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0051] Figure 1 A schematic diagram of the cloud-edge collaborative method based on multi-source heterogeneous information fusion provided in an embodiment of the present invention;

[0052] Figure 2 This is a schematic diagram of the multi-source heterogeneous information fusion method.

[0053] Figure 3 This is a schematic diagram of the cloud-based decision-making integration process.

[0054] Figure 4 This is a schematic diagram of the classification head transfer process for a reinforcement learning-based target detection model under cloud-edge collaboration. Detailed Implementation

[0055] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0056] The purpose of this invention is to provide a cloud-edge collaborative method based on multi-source heterogeneous information fusion. In the information fusion stage, feature extraction and efficient integration are performed on multimodal data collected by different sensors. An evidence fusion strategy is introduced to ensure the reliability of the evidence. To address the potential high conflict problem of evidence after fusion of multiple edge devices, an evidence correction strategy is introduced. Combined with conflict identification and dynamic management mechanisms, the generation of counterintuitive conclusions is effectively avoided. Finally, global optimization is performed through reinforcement learning to achieve dynamic interaction and intelligent feedback between the cloud and the edge, ensuring low conflict and high reliability of the system in complex environments.

[0057] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0058] Figure 1 This is a schematic diagram of the cloud-edge collaborative method based on multi-source heterogeneous information fusion provided in an embodiment of the present invention, as shown below. Figure 1 As shown, this invention provides a cloud-edge collaborative method based on multi-source heterogeneous information fusion, comprising:

[0059] Step 1: Acquire multimodal images, deploy feature extraction and image recognition models on the side, process and fuse them, decompose the multimodal images into basic parts and detailed content, use a weighted average strategy to fuse the basic parts, use the VGG-19 network to extract features from the detailed content, generate the fused detailed content by selecting a strategy, and finally reconstruct the fused image.

[0060] Step 2: Construct the Faster R-CNN network and multi-label training dataset. Replace the single-label classification head of Faster R-CNN with a multi-label Sigmoid layer. Train the network using multi-label binary cross-entropy loss combined with bounding box regression loss and output the multi-label membership probability vector of the target candidate region.

[0061] Step 3: Deploy supervised and unsupervised methods on the cloud to fuse multi-perspective basic belief assignments. The supervised method is based on the historical accuracy of sensors; the unsupervised method constructs an evidence distance matrix, calculates the distance between each piece of evidence, determines the best and worst BPA, uses Deng relative entropy to measure the best-other vector and other-worst vector, solves the evidence weights through a constrained optimization problem, determines the acceptance threshold through sensitivity analysis, discounts the evidence based on the fusion of the two weights, and generates the final belief assignment using the Dempster rule.

[0062] Step 4: Construct a cloud-based reinforcement learning decision model. Based on the prediction results of the edge model and the cloud model, extract the state features of each target category, construct the state space, and use a deep Q-network to build a reinforcement learning model. Train the model to obtain a transfer decision strategy based on performance gain.

[0063] Step 5: Integrate cloud-side classification head transfer and edge-side model. Based on the transfer action output by the reinforcement learning system, fine-tune the classification head parameters of the selected category Faster R-CNN model to form a transfer patch. Then, replace the patch with the corresponding position of the original model to build a fusion model and complete edge inference update under multi-view image input.

[0064] like Figure 2 As shown, in step 1, the multimodal image is decomposed into a basic part and detailed content. A weighted average strategy is used to fuse the basic part, and the VGG-19 network is used to extract features from the detailed content. The fused detailed content is generated by selecting a strategy, and finally the fused image is reconstructed. Specifically:

[0065] Acquire multimodal images, including visible light images, infrared images, and SAR images;

[0066] For each input image I KLet K represent the visible light image, infrared image, and SAR image. The basic and detailed information is obtained by solving an optimization problem, where the optimization problem is:

[0067]

[0068] In the formula, The basic part, g x =[-1,1]、g y =[-1,1] T For the gradient operator, λ = 5. For details;

[0069] The basic components are fused using a weighted average strategy, as follows:

[0070]

[0071] In the formula, (x,y) are pixel coordinates These are the basic components of visible light images, infrared images, and SAR images, respectively.

[0072] The pre-trained VGG-19 network is used to extract multi-layer features from the details, including ReLU1_1 to ReLU4_1 layers. The activity level map is calculated using the L1 norm, and a weight map is generated by Softmax. After upsampling, the details are weighted and fused. Finally, the fused details F is generated using the maximum selection strategy. d ;

[0073] The resulting fused base and fused detail parts are directly added together to obtain the final fused image, which is:

[0074] F = F b +F d .

[0075] In step 2, a multi-label training dataset is constructed, specifically as follows:

[0076] Obtain the label vector Y = [y1, y2, ..., y] for each target instance. L ], where y l ∈{0,1} indicates whether it belongs to the l-th label class, and a multi-label training dataset is constructed.

[0077] In step 2, the loss function of the Faster R-CNN network is:

[0078]

[0079] In the formula, L BCE Y is the predicted label probability vector, and Y is the true label vector. B and B are the predicted and actual bounding box coordinates, respectively;

[0080] The Faster R-CNN network outputs the instance number, predicted category, bounding box coordinates, and confidence score for all classification labels. The detection results from each sensor are converted into a unified format: [instance index, category, bounding box, probability array].

[0081] For different sensor poses and heights, the sensor with the most detected instances is selected as the reference viewpoint, and the detection bounding box is rotated and scaled to be mapped to the reference coordinate system.

[0082] The target pool is initialized with instances detected by the benchmark sensor. The IoU matrix between the benchmark sensor and the current sensor detection results is calculated as the matching cost. The optimal matching pair is solved by linear allocation using the Hungarian algorithm. An IoU threshold greater than 0.5 is set to filter valid matches. Successfully matched targets are fused, and new targets that are not matched are added to the target pool as new instances.

[0083] like Figure 3 As shown, in step 3, to address the potential high conflict between evidence from multiple side sensors, an evidence correction strategy is introduced. Combined with conflict identification and dynamic management mechanisms, this effectively avoids counterintuitive conclusions. Multi-perspective Basic Belief Assignment (BPA) is fused using supervised and unsupervised methods. The supervised method is based on the historical accuracy of the sensors; the unsupervised method constructs an evidence distance matrix, calculates the distance between each piece of evidence, determines the best and worst BPA, uses Deng's relative entropy to measure the best-others and other-worst vectors, solves the evidence weights through a constrained optimization problem, calculates the consistency ratio to verify the reliability of the weights, discounts the evidence based on the weights, and generates the final belief assignment using the Dempster rule. Specifically:

[0084] A single-element focal element m is constructed from the probability array of the same instance detected by multiple edges based on the multi-label membership probability vector. i and m j The Jousselme distance matrix between them:

[0085]

[0086] in, D is the Jaccard matrix:

[0087] Calculate the distance CD(m) between each piece of evidence. j ),for:

[0088]

[0089] The worst belief m is assigned to the candidate with the largest CD value. W This means that this BPA contributes the most to the system's disorder level, and is related to m. W The one furthest away is assigned the best belief m. B ,

[0090] The allocation m of two basic beliefs is calculated using Deng's relative entropy. i and m j Information differences between them:

[0091]

[0092] Satisfies nonnegativity (σ≥0) and asymmetry (σ(m) i ||m j )≠σ(m j ||m i ));

[0093] Construct a relative entropy comparison matrix, including m B Vectors to other BPAs and other BPAs to m W The vector is:

[0094] M B =(σ(m) B ||m1),σ(m B ||m2),...,σ(m B ||m n ))

[0095] M W =(σ(m1||m W ),σ(m2||m w ),...,σ(m n ||m w )) T

[0096] The evidence weight ω is obtained by solving a constrained optimization problem. j ,for:

[0097] minξ

[0098]

[0099] ∑ω j =1,ω j ≥0

[0100] The acceptance threshold ξ was determined through sensitivity analysis. max If ξ≤ξ max If the weights are accepted, the comparison matrix is ​​readjusted.

[0101] The accuracy of the sensor under different viewpoints is calculated separately. Supervised weights are determined based on these accuracy rates. The supervised and unsupervised weights are then fused together to obtain:

[0102] Fusion weight = alpha * supervised weight + (1-alpha) * ω j

[0103] Let alpha = 0.5 temporarily, and then apply the fusion weight discount to the belief assignment. The final belief assignment is obtained by fusion using the Dempster rule.

[0104] like Figure 4 As shown, in step 4, a cloud-based reinforcement learning decision model is constructed. Based on the prediction results of the edge model and the cloud model, the state features of each target category are extracted, a state space is constructed, and a deep Q-network is used to establish a reinforcement learning model. The model is then trained to obtain a transfer decision policy based on performance gains. Specifically:

[0105] Based on the prediction results (result.xlsx) from the edge model and cloud model (Excel fields include: aircraft: detected aircraft, satellite_id: satellite corresponding to the detected aircraft, soft_label_vector_edge: edge-side soft label, confidence_edge: edge-side category confidence, type_prediction_edge: edge-side predicted aircraft category, confidence_cloud: cloud-side confidence, type_prediction_cloud: cloud-side predicted category), a state feature vector is extracted for each target category. This state feature vector includes the edge category accuracy (Acc). edge Cloud-based category accuracy (ACC) cloud And the difference between the two, Δ = A cloud -A edge The state vector S = [A] is formed. edge A cloud ,Δ];

[0106] Construct a reinforcement learning agent with a deep Q-network as its core, select an action {0, 1}, generate a random number < ε: random action, otherwise: select an action argmax(Q(s,a)) based on the current Q-network, perform transfer and evaluate the new accuracy;

[0107] A reward mechanism is set up so that, after the migration is executed, the accuracy change δ = A for this category based on the edge model is calculated. cloud -A edge Calculate the reward R, i.e.:

[0108]

[0109] In the formula, θ is the minimum precision gain threshold, δ' is the accuracy of the previous effective transfer, and A cloud For the current model's cloud-side accuracy, A edge This represents the side accuracy of the current model.

[0110] The policy is generated and iterated through the Q-function. (state, action, reward, next_state) is added to the experience pool to update the DQN policy, which outputs a transition policy π(s) = argmax based on state S. a Q(s,a) is used to indicate whether to perform a migration on the target category.

[0111] In step 5, the cloud-side classification head transfer and edge-side model are fused. Based on the transfer actions output by the reinforcement learning system, the classification head parameters of the Faster R-CNN model for the selected category are fine-tuned to form a transfer patch. This patch is then replaced with the corresponding position in the original model to construct a fused model. Edge inference updates are then performed under multi-view image input. Specifically:

[0112] Based on the output of the generated transfer strategy, select the category with action 1 and trigger the transfer training of the corresponding classification head;

[0113] After each transfer learning iteration, the accuracy of this category on both the cloud model and the edge model is calculated and denoted as A. cloud A edge And calculate the accuracy difference δ = A cloud -A edge If δ≤θ, then the transfer is considered to be fully completed, the transfer training patch for that class is saved, and the transfer operation for that class will not be triggered again in subsequent training cycles;

[0114] Construct a dataset and load the Faster R-CNN pre-trained model, freeze all parameters except for this class, and train only the classification heads head.score.weight and head.score.bias for this class;

[0115] During transfer training, all network parameters in the Faster R-CNN network are frozen except for the classifier parameters corresponding to the current target class T. Only the classification weight vector and bias term for that class are fine-tuned to improve its detection performance in the edge model. Fine-tuning training uses only predicted candidate regions (ROIs) with high overlap with the bounding boxes (IoU > 0.5) as positive samples. The objective function consists of two parts:

[0116] L total =L cls +λL reg

[0117] In the formula, L cls The classification loss is calculated using the cross-entropy function, L, to determine the class discrimination error of the positive sample ROI region. leg The regression loss uses the Smooth L1 Loss function for bounding box coordinate regression of positive ROIs; λ = 0.1 is the weighting coefficient of the regression loss, used to balance the classification and location regression objectives; the selection criterion for positive ROI regions is an IoU overlap with the ground truth bounding box > 0.5; fine-tuning only applies to the classification header parameters corresponding to the current category, i.e.:

[0118] W T = head.score.weight[T]

[0119] After the migration is completed, the patch parameters obtained from training are stored as a dedicated lightweight update module for this category and replaced in the corresponding position of the edge model to complete the local update and model fusion deployment.

[0120] During training, let the set of all target categories be: C = {T1, T2, ..., T} N For each category T i ∈C, calculate the accuracy difference between its cloud model and edge model after each training round:

[0121]

[0122] in, Category T i Prediction accuracy on cloud-based models; Category T i Prediction accuracy on the marginal model; δ Ti This indicates the performance difference between the edge model and the cloud model. Define a function to indicate the training completion status:

[0123]

[0124] Where θ is an adjustable accuracy convergence threshold (0.02), which is achieved when the accuracy difference between all target categories does not exceed this threshold, i.e.: If this is the case, it is determined that all target categories have been transferred, and the training process is automatically terminated.

[0125] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.

[0126] Specific examples have been used to illustrate the principles and implementation methods of this invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of this invention. Furthermore, those skilled in the art will recognize that, based on the ideas of this invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of this invention.

Claims

1. A cloud-edge collaborative method based on multi-source heterogeneous information fusion, characterized in that, include: Step 1: Acquire multimodal images, process and fuse them. Decompose the multimodal images into basic parts and detailed content. Use a weighted average strategy to fuse the basic parts and use the VGG-19 network to extract features from the detailed content. Generate the fused detailed content by selecting a strategy, and finally reconstruct the fused image. Step 2: Construct the Faster R-CNN network and multi-label training dataset. Replace the single-label classification head of Faster R-CNN with a multi-label Sigmoid layer. Train the network using multi-label binary cross-entropy loss combined with bounding box regression loss and output the multi-label membership probability vector of the target candidate region. Step 3: The basic belief assignment from multiple perspectives is fused using supervised and unsupervised methods. The supervised weights are based on the historical accuracy of the sensors. The unsupervised weights are determined by constructing an evidence distance matrix, calculating the distance between each piece of evidence, determining the best and worst BPA, using Deng's relative entropy to measure the best-other vector and other-worst vector, solving the evidence weights through a constrained optimization problem, determining the acceptance threshold through sensitivity analysis, discounting the evidence based on the fusion of the two weights, and generating the final belief assignment using the Dempster rule. Step 4: Construct a cloud-based reinforcement learning decision model. Based on the prediction results of the edge model and the cloud model, extract the state features of each target category, construct the state space, and use a deep Q-network to build a reinforcement learning model. Train the model to obtain a transfer decision strategy based on performance gain. Step 5: Integrate cloud-side classification head transfer and edge-side model. Based on the transfer action output by the reinforcement learning system, fine-tune the classification head parameters of the Faster R-CNN model for the selected category to form a transfer patch. Then, replace the patch with the corresponding position of the original model to build a fusion model and complete edge inference update under multi-view image input. In step 4, a cloud-based reinforcement learning decision model is constructed. Based on the prediction results of the edge model and the cloud model, the state features of each target category are extracted, a state space is constructed, and a deep Q-network is used to establish a reinforcement learning model. The model is then trained to obtain a transfer decision policy based on performance gains. Specifically: Based on the prediction results of the edge model and the cloud model, a state feature vector is extracted for each target category. The state feature vector includes the accuracy of the edge category. Cloud-based category accuracy And the difference between the two , constitute the state vector ; Construct a reinforcement learning agent with a deep Q-network at its core, and define the action set. Where 0 indicates skipping the migration, and 1 indicates triggering the migration operation for the current category; A reward mechanism is set up to evaluate the accuracy change of this category based on the edge model after the migration is executed. Calculate the reward R, i.e.: In the formula, θ is the minimum precision gain threshold. The accuracy of the previous effective migration, The cloud-side accuracy of the current model, This represents the side accuracy of the current model. The generation strategy is iteratively updated using the Q-function, outputting a transition strategy based on state S. This is used to indicate whether to perform a migration on the target category.

2. The method according to claim 1, characterized in that, In step 1, the multimodal image is decomposed into a basic part and detailed content. A weighted average strategy is used to fuse the basic part, and the VGG-19 network is used to extract features from the detailed content. By selecting a strategy, the fused detailed content is generated, and finally the fused image is reconstructed. Specifically: Acquire multimodal images, including visible light images, infrared images, and SAR images; For each input image , For visible light images, infrared images, and SAR images, the basic components and details are obtained by solving an optimization problem. The optimization problem is as follows: In the formula, Basic part, For gradient operators, , For details; The basic components are fused using a weighted average strategy, as follows: In the formula, , For pixel coordinates, These are the basic components of visible light images, infrared images, and SAR images, respectively. The pre-trained VGG-19 network is used to extract multi-layer features from detailed content, including... Layer to Layer, through Norm calculation of the activity level map, softmax generation of a weight map, upsampling and weighted fusion of details, and finally generation of fused details through a maximum selection strategy. ; The resulting fused base and fused detail parts are directly added together to obtain the final fused image, which is: 。 3. The method according to claim 2, characterized in that, In step 2, a multi-label training dataset is constructed, specifically as follows: Obtain the label vector formed for each target instance ,in, Indicates whether it belongs to the first Class labels are used to construct a multi-label training dataset.

4. The method according to claim 3, characterized in that, In step 2, the loss function of the Faster R-CNN network is: In the formula, To predict the label probability vector, For the actual label vector, and These are the coordinates of the predicted and actual bounding boxes, respectively. The Faster R-CNN network outputs the instance number, predicted class, bounding box coordinates, and confidence scores for all classification labels. The detection results from each sensor are converted into a unified format: [instance index, class, bounding box, probability array].

5. The method according to claim 4, characterized in that, In step 3, the basic belief assignment from multiple perspectives is fused using supervised and unsupervised methods. Supervised weights are based on the historical accuracy of the sensors; unsupervised weights are determined by constructing an evidence distance matrix, calculating the distances between each piece of evidence, determining the best and worst BPA, using Deng's relative entropy to measure the best-others and other-worst vectors, solving for the evidence weights through a constrained optimization problem, determining the acceptance threshold through sensitivity analysis, discounting the evidence based on the fusion of the two weights, and generating the final belief assignment using Dempster's rule. Specifically: Construct the Jousselme evidence distance matrix based on multi-label membership probability vectors; Calculate the distance between each piece of evidence ,choose The largest is assigned to the worst belief. ,and The one furthest away is assigned the best belief. ; The best-others and other-worst-others vectors are calculated based on Deng's relative entropy, and the evidence weights are solved through a constrained optimization problem. The acceptance threshold was determined through sensitivity analysis. ,like If the weights are accepted, the comparison matrix is ​​readjusted. Supervised weights are determined based on the historical accuracy of sensors, and the two weights are weighted and fused to obtain new weights. The beliefs are then discounted according to the new weights and merged using the Dempster rule to obtain the final belief assignment.

6. The method according to claim 5, characterized in that, In step 5, the cloud-side classification head transfer and edge-side model are fused. Based on the transfer actions output by the reinforcement learning system, the classification head parameters of the Faster R-CNN model for the selected category are fine-tuned to form a transfer patch. This patch is then replaced with the corresponding position in the original model to construct the fused model. Finally, edge inference updates are completed under multi-view image input. Specifically: Based on the output of the generated transfer strategy, select the category with action 1 and trigger the transfer training of the corresponding classification head; After each transfer learning iteration, the accuracy of this category on both the cloud model and the edge model is calculated and denoted as . And calculate the precision difference. ,like If the transfer is complete, the transfer training patch for that category is saved, and no further transfer operations for that category will be triggered in subsequent training cycles. During the transfer training process, all parameters in the Faster R-CNN network except for the classifier head corresponding to the current target class are frozen. Backpropagation and parameter optimization are performed only on the weight vector head.score.weight and the bias term head.score.bias corresponding to the class. After the transfer is completed, the patch parameters obtained by training are stored as a dedicated lightweight update module for the class and replaced and written to the corresponding position of the edge model to complete the local update and model fusion deployment. Once all target categories have completed the migration, and their cloud-edge accuracy difference is... The entire training process will be terminated automatically.