An explanation sequence generation method for assisting visual decision of an unmanned system

By generating a set of decision saliency maps using global workspace theory and the maximum empirical risk algorithm, the problem of lack of interpretability in visual supervision task models for unmanned systems is solved. This provides hierarchical decision-making basis and transparent decision-making process, enhancing the interpretability and iterability of the model.

CN115272782BActive Publication Date: 2026-06-26FUDAN UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
FUDAN UNIVERSITY
Filing Date
2021-12-27
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

While existing visual supervision task models for unmanned systems can automatically identify object information, they cannot inform users of the model's decision-making basis, leading to difficulties in model iteration and a lack of interpretability.

Method used

By employing global workspace theory and combining feature map tensors and gradient information from convolutional neural networks, a set of decision saliency maps is obtained through the maximum empirical risk algorithm. The weight coefficients of the saliency map set are then obtained based on the global workspace mapping, and finally, a decision sequence is synthesized to provide hierarchical decision-making basis.

Benefits of technology

It achieves interpretability of model decisions, gives unmanned systems a transparent decision-making process, provides developers with ideas for iterative optimization, and enhances the reliability of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115272782B_ABST
    Figure CN115272782B_ABST
Patent Text Reader

Abstract

The application provides an explanation sequence generation method for assisting visual decision of an unmanned system, provides hierarchical decision basis for supervision tasks of the unmanned system, gives the system decision transparency, and facilitates developers to continuously optimize the model, and is characterized in that the method comprises the following steps: step S1, image data is input into a pre-trained convolutional neural network model to obtain a feature map tensor of the last layer and gradient information of each layer; step S2, based on the feature map tensor and the gradient information of each layer, an artificial intelligence explainable method is used to obtain a decision saliency map set; step S3, based on the decision saliency map set, an activated input data image set is obtained; step S4, the activated input data image set is input into the pre-trained convolutional neural network model, and global workspace mapping is used to obtain weight coefficients corresponding to each saliency map group; and step S5, a saliency map is synthesized based on the weight coefficients corresponding to each saliency map group, and a decision sequence is obtained based on a predetermined order of the weight coefficients.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer applications, and specifically relates to a method for generating an interpretation sequence to assist visual decision-making in unmanned systems. Background Technology

[0002] Visual decision-making tasks are common in unmanned systems, such as traffic light recognition in autonomous driving scenarios and obstacle recognition in food delivery robots. Currently, well-performing visual supervision models can automatically identify object information, but they cannot inform users of the basis for the model's decisions. This poses a challenge to model iteration for unmanned system developers. Therefore, interpretable methods for model decision-making, derived from real-world needs, can provide explainable evidence for model users and offer evidence when the model makes mistakes.

[0003] Understanding and interpreting the decisions made by deep neural network-based models is crucial for developers. Methods that make models interpretable help build user trust in deep neural network models. In computer vision, popular interpretation methods generate intuitive saliency maps that highlight the regions most relevant to the neural network's decisions. There are many methods for saliency map generation, such as region-based saliency map methods like XRAI, gradient-based saliency map methods like Grad-Cam, and saliency map methods that introduce data uncertainty, such as Aleatoric Uncertainty. This wide variety of interpretation map generation methods can make it difficult for model developers to choose the appropriate one.

[0004] To unify the above-mentioned model explanations, this invention draws on cognitive science's understanding of brain activity during human interpretation. Global workspace theory, proposed by American psychologist Bernard Barth, is a model of consciousness during human decision-making. This theory posits that consciousness is associated with a global "broadcasting system" that broadcasts information throughout the brain. The brain's dedicated visual processing modules typically process information automatically without actively reflecting on the basis of the decision. When a person actively reflects on the decision information, various visual processing modules compete to analyze the information within the global workspace to obtain an interpretation. Summary of the Invention

[0005] To address the aforementioned problems, this invention provides a method for generating explanatory sequences to assist visual decision-making in unmanned systems. The technical solution adopted in this invention is as follows:

[0006] This invention provides a method for generating interpretable sequences to assist visual decision-making in unmanned systems, characterized by the following steps: Step S1, inputting image data into a pre-trained convolutional neural network model to obtain the feature map tensor of the last layer and the gradient information of each layer; Step S2, based on the feature map tensor and the gradient information of each layer, using an artificial intelligence interpretable method to obtain a set of decision saliency maps; Step S3, obtaining a set of activated input data images based on the set of decision saliency maps; Step S4, inputting the set of activated input data images into the pre-trained convolutional neural network model, and using global workspace mapping to obtain the weight coefficients corresponding to each saliency map group; Step S5, synthesizing saliency maps based on the weight coefficients corresponding to each saliency map group, and obtaining a decision sequence based on a predetermined order of the weight coefficients.

[0007] The present invention provides a method for generating an interpretation sequence to assist visual decision-making in unmanned systems. This method also includes the following technical feature: the acquisition process of the feature map tensor of the last layer and the gradient information of each layer is as follows: inputting image data I into a pre-trained convolutional neural network model. Perform forward propagation, based on the loss function. The feature map tensor of the last layer is obtained: F C*H*W In the formula, C represents the number of channels, H represents the image width, W represents the image length, and W represents the gradient information for each layer: In the formula, l corresponds to a layer in a convolutional neural network.

[0008] The present invention provides a method for generating an interpretation sequence for assisting visual decision-making in unmanned systems, which may also have the following technical feature, wherein, in step S2, the decision saliency map set is:

[0009] S={s 纹理 s 形状 s 置信度}

[0010] In the formula, the image size of s is consistent with the image data I.

[0011] The present invention provides a method for generating an interpretation sequence to assist visual decision-making in unmanned systems, which may also have the following technical feature, wherein, in step S3, the set of activated input data images is:

[0012]

[0013] In the formula, Λ is the perturbation matrix.

[0014] The present invention provides a method for generating an interpretation sequence to assist visual decision-making in unmanned systems, which also has the following technical feature: the global workspace mapping uses a maximum empirical risk algorithm as the learning method. The maximum empirical risk algorithm is as follows:

[0015]

[0016] In the formula, θ represents the model parameters. The expectation is nonlinear, where x represents the input data, y represents the supervision information, and g is the expression. θ For a parameterized model, p(x,y) represents the data distribution, N represents the number of methods to generate interpretable graphs, and s j x represents the number of samples used to train the j-th method for generating interpretable graphs. jk For the k-th data sample in the j-th method for generating interpretable graphs, y jk For x jk Data tags, This is the loss function.

[0017] The present invention provides a method for generating an interpretation sequence to assist visual decision-making in unmanned systems, which may also have the following technical features, wherein step S4 includes the following sub-steps: Step S4-1, inputting the activated input data image set I′ into a convolutional neural network model. Based on model parameters θ and loss function Find the gradients and combine them into matrix G:

[0018]

[0019] In the formula, Represented as texture-based model parameters θ 纹理 Calculate the gradient of the loss function, s 纹理 s 形状 s 模型置信度 For each saliency map generated by an existing interpretability method, T represents the transpose symbol; in step S4-2, to obtain the weight coefficient λ of the unified semantic interpretation representation, the maximum empirical risk algorithm is transformed into solving the following optimization problem through quadratic approximation, and the weight coefficient λ is obtained using a traditional quadratic programming solver:

[0020]

[0021]

[0022] The present invention provides a method for generating an interpretation sequence to assist visual decision-making in unmanned systems, which may also have the following technical features, wherein, in step S7, the saliency diagram is as follows:

[0023]

[0024] The decision sequence is obtained by sorting the elements in the weight coefficients λ from highest to lowest:

[0025] Seq out=rank(λ).

[0026] The present invention also provides an interpretation sequence generation system for assisting visual decision-making in unmanned systems, characterized in that it includes: a media data acquisition module, a calculation module, and a result display module. The media data acquisition module is used to acquire image data, the calculation module is used to process and analyze the image data and acquire the corresponding decision sequence and synthetic saliency map, and the result display module is used to display the image data and the decision sequence and synthetic saliency map output by the calculation module.

[0027] The system for generating an interpretation sequence for assisting unmanned systems in visual decision-making, provided by this invention, may also have the following technical features: the computing module has an embedded coding unit, which includes a coding subunit, an interpretable subunit, a global workspace subunit, and a fusion subunit. The coding subunit is used to analyze the input image data to obtain a result encoding. The interpretable subunit obtains post-interpretive data corresponding to the result encoding based on an interpretable artificial intelligence algorithm. The global workspace subunit uses a maximum empirical risk algorithm to compete among the post-interpretive data to obtain interpretive representation weight coefficients with unified semantics. The fusion subunit is used to fuse the weight coefficients and the corresponding post-interpretive data to obtain a synthetic saliency map and a decision sequence.

[0028] Invention Function and Effect

[0029] According to the method for generating an interpretable sequence for visual decision-making in an assisted unmanned system of the present invention, a set of decision saliency maps is obtained based on the feature map tensor of the input image data and the gradient information of each layer. The weight coefficients of the saliency map set are obtained by using an interpretable method in the global workspace on the activated set of decision saliency maps. Finally, the decision sequence is obtained by sorting the elements in the weight coefficients from high to low.

[0030] The method for generating interpretable sequences for visual decision-making in assisted unmanned systems of the present invention adopts a mapping that conforms to the global workspace mechanism. This allows semantically interpretable saliency map information to competitively enter the global workspace mapping when the model user needs to know the basis of the model's decision, thereby unifying the semantic interpretation and ultimately obtaining an interpretable hierarchical decision sequence and a fused saliency map. This provides a hierarchical basis for the supervision tasks of unmanned systems, giving the system decision-making transparency. At the same time, it also provides developers with an iterative approach to the model, making it convenient for developers to continuously optimize the model. Attached Figure Description

[0031] Figure 1 This is a flowchart of the method for generating an interpretation sequence for assisting unmanned systems in visual decision-making according to an embodiment of the present invention;

[0032] Figure 2This is a schematic diagram of the structure of the interpretation sequence generation system for assisting unmanned systems in visual decision-making in an embodiment of the present invention;

[0033] Figure 3 This is a schematic diagram of the embedded coding unit in an embodiment of the present invention. Detailed Implementation

[0034] To make the technical means, creative features, objectives and effects of this invention easy to understand, the following describes in detail the method for generating the interpretation sequence of visual decision-making for assisted unmanned systems in conjunction with embodiments and accompanying drawings.

[0035] <Example>

[0036] Figure 1 This is a flowchart of the method for generating an interpretation sequence for assisting unmanned systems in visual decision-making according to an embodiment of the present invention.

[0037] like Figure 1 As shown, the method for generating the explanatory sequence for assisting visual decision-making in unmanned systems includes the following steps:

[0038] Step S1: Input the image data I into the pre-trained convolutional neural network model. Perform forward propagation, based on the loss function. We obtain the feature map tensor of the last layer and the gradient information of each layer.

[0039] The feature map tensor of the last layer is: F C*H*W In the formula, C represents the number of channels, H represents the image width, and W represents the image length. The gradient information for each layer is: In the formula, l corresponds to a layer in a convolutional neural network.

[0040] Step S2: Based on the feature map tensor and the gradient information of each layer, various AI-interpretable methods are used to obtain a set of decision saliency maps, which is S = {s} 纹理 s 形状 s 置信度 In this formula, the image size of s is consistent with the image data I.

[0041] Step S3: Obtain the set of activated input data images based on the decision saliency map set S. In the formula, Λ is the perturbation matrix.

[0042] Step S4: Input the activated input data image set into the pre-trained convolutional neural network model, and use global workspace mapping to obtain the weight coefficients corresponding to each saliency group.

[0043] In this embodiment, the maximum empirical risk algorithm is used as the global workspace mapping learning method. The maximum empirical risk algorithm is as follows:

[0044]

[0045] In the formula, θ represents the model parameters. The expectation is nonlinear, where x represents the input data, y represents the supervision information, and g is the expression. θ For a parameterized model, p(x,y) represents the data distribution, N represents the number of methods to generate interpretable graphs, and s j x represents the number of samples used to train the j-th method for generating interpretable graphs. jk For the k-th data sample in the j-th method for generating interpretable graphs, y jk For x jk Data tags, This is the loss function.

[0046] This step S4 includes the following sub-steps:

[0047] Step S4-1: Input the activated input data image set I′ into the convolutional neural network model. Based on model parameters θ and loss function Find the gradients and combine them into matrix G:

[0048]

[0049] In the formula, Represented as texture-based model parameters θ 纹理 Calculate the gradient of the loss function, s 纹理 s 形状 s 模型置信度 These correspond to the saliency maps generated by existing interpretable methods, with T representing the transpose symbol;

[0050] Step S4-2, to obtain the weight coefficients λ of the unified semantic interpretation representation, the above maximum empirical risk algorithm is transformed into solving the following optimization problem through quadratic approximation:

[0051]

[0052]

[0053] In the formula, G is the matrix defined in step S4-1, λ is the Lagrange multiplier, and N is the total number of saliency map generation methods.

[0054] To solve the quadratic programming problem, the weight coefficients λ can be obtained using a traditional quadratic programming solver.

[0055] Step S5: Synthesize a saliency map based on the weight coefficients corresponding to each saliency map group, and obtain a decision sequence based on the predetermined order of the weight coefficients.

[0056] The synthesized saliency map is as follows:

[0057]

[0058] Decision Sequence Seq out This is obtained by sorting the elements in the weight coefficients from highest to lowest:

[0059] Seq out =rank(λ).

[0060] Figure 2 This is a schematic diagram of the structure of the interpretation sequence generation system for assisting visual decision-making in unmanned systems according to an embodiment of the present invention.

[0061] like Figure 2 As shown, the interpretation sequence generation system 100 includes a media data acquisition module 1, a calculation module 2, and a result display module 3.

[0062] The media data acquisition module 1 is used to acquire image data from the camera device or locally stored data.

[0063] The calculation module 2 is used to process and analyze the image data acquired by the media data acquisition module 1 and obtain the corresponding decision sequence and synthesized saliency map. This calculation module is executable code.

[0064] The computation module 2 has a graphics processing unit and an embedding encoding unit. The graphics processing unit is used to preprocess the image data, and the embedding encoding unit is a pre-trained model used to encode the processed image data to generate a decision sequence corresponding to the graphics data and to synthesize a saliency map.

[0065] Figure 3 This is a schematic diagram of the embedded coding unit in an embodiment of the present invention.

[0066] like Figure 3 As shown, the embedding coding unit 10 has a coding subunit 101, an interpretable subunit 102, a global workspace subunit 103, and a fusion subunit 104.

[0067] The encoding subunit 101 is used to analyze the input image data through a pre-trained model to obtain the result encoding.

[0068] The interpretable subunit 102 obtains post-interpreted data encoded by the result based on an interpretable artificial intelligence algorithm.

[0069] The global workspace sub-unit 103 uses the maximum empirical risk algorithm to compete for the post-interpretation data and obtain the interpretation representation weight coefficients with unified semantics.

[0070] The fusion subunit 104 is used to fuse the weight coefficients and the corresponding post-explanation data to obtain a synthetic saliency map and a decision sequence.

[0071] Results display module 3 is used to display the image data acquired by media data acquisition module 1 and the decision sequence and synthetic saliency map output by calculation module 2. Results display module 3 can be a computer or a mobile device.

[0072] Functions and effects of the embodiments

[0073] According to the method for generating interpretable sequences for assisted visual decision-making in unmanned systems provided in this embodiment, the method obtains a set of decision saliency maps based on the feature map tensor of the input image data and the gradient information of each layer. The activated set of decision saliency maps is then used with a global workspace interpretability method to obtain the weight coefficients of the saliency map group. Based on these weight coefficients, the elements are sorted from highest to lowest to obtain the decision sequence. This method employs a mapping conforming to the global workspace mechanism, allowing semantically interpretable saliency map information to competitively enter the global workspace mapping when the model user needs to know the basis of the model's decision, thus unifying the semantic interpretation. Ultimately, an interpretable hierarchical decision sequence and fused saliency map are obtained, providing hierarchical decision-making basis for the supervision task of unmanned systems, giving the system decision-making transparency, and also providing developers with an iterative approach to the model, facilitating continuous model optimization.

[0074] In the embodiment, since the decision sequence is obtained based on the order of elements from high to low in the weight coefficients of the saliency graph group, developers can also quickly determine the decision factors of the model through the saliency graph, and combined with the decision sequence, they can know the sequence of factors that affect the model's decision, such as ("shape" -> "texture" -> "confidence"), as evidence that the model's decision is influenced from high to low.

[0075] The above embodiments are only used to illustrate specific implementations of the present invention, and the present invention is not limited to the scope of the description of the above embodiments.

Claims

1. A method for generating an explanatory sequence to assist visual decision-making in unmanned systems, characterized in that, Includes the following steps: Step S1: Input the image data into the pre-trained convolutional neural network model to obtain the feature map tensor of the last layer and the gradient information of each layer; Step S2: Based on the feature map tensor and the gradient information of each layer, an artificial intelligence interpretable method is used to obtain a set of decision saliency maps; Step S3: Obtain the set of activated input data images based on the decision saliency map set; Step S4: Input the activated input data image set into the pre-trained convolutional neural network model, and use global workspace mapping to obtain the weight coefficients corresponding to each saliency group. Step S5: Synthesize a saliency map based on the weight coefficients corresponding to each saliency map group, and obtain a decision sequence based on the predetermined order of the weight coefficients. The global workspace mapping employs the maximum empirical risk algorithm as the learning method. The maximum empirical risk algorithm is as follows: , In the formula, For model parameters, For nonlinear expectations, Represents input data, Represents oversight information. For parameterized models, Represents data distribution, This represents the number of methods for generating interpretable graphs. Representative used for training the first Number of samples for each method of generating interpretable graphs. For the first The first method in the generation of interpretable graphs One data sample, for Data tags, For loss function, Step S4 includes the following sub-steps: Step S4-1, activate the set of input data images. Input to convolutional neural network model Based on model parameters For loss function Find the gradients and combine them into a matrix. : , In the formula, Represented as texture-based model parameters Calculate the gradient of the loss function. These correspond to saliency maps generated by existing interpretable methods. Represented as the transpose symbol; Step S4-2: To obtain the weight coefficients for the unified semantic interpretation representation. The maximum empirical risk algorithm is transformed into solving the following optimization problem through quadratic approximation, and the weight coefficients are obtained using a traditional quadratic programming solver. : 。 2. The method for generating an interpretation sequence for assisting visual decision-making in an unmanned system according to claim 1, characterized in that: in, The process of obtaining the feature map tensor of the last layer and the gradient information of each layer is as follows: Image data Input to the pre-trained convolutional neural network model Perform forward propagation, based on the loss function. The feature map tensor of the last layer is obtained as follows: , In the formula, For the channel, Image width Image length And gradient information for each layer: , In the formula, The corresponding layer in a convolutional neural network.

3. The method for generating an interpretation sequence for assisting visual decision-making in an unmanned system according to claim 2, characterized in that: in, In step S2, the decision saliency map set is: , In the formula, Image size and the image data Consistent.

4. The method for generating an interpretation sequence for assisting visual decision-making in an unmanned system according to claim 3, characterized in that: in, In step S3, the set of activated input data images is: , In the formula, Let be the perturbation matrix.

5. The method for generating an interpretation sequence for assisting visual decision-making in an unmanned system according to claim 4, characterized in that: in, In step S5, the salience map is: , The decision sequence is based on the weighting coefficients. Sort the elements in the array from highest to lowest to get: 。 6. A system for generating an interpretation sequence to assist visual decision-making in unmanned systems, used to execute the method for generating an interpretation sequence to assist visual decision-making in unmanned systems as described in claim 1, characterized in that, include: The media data acquisition module, calculation module, and results display module are included. The media data acquisition module is used to acquire image data. The computing module is used to process and analyze the image data and obtain the corresponding decision sequence and synthesize a saliency map. The result display module is used to display the image data, the decision sequence output by the calculation module, and the synthesized saliency map. The computing module includes an embedded encoding unit. The embedding coding unit has coding subunits, interpretable subunits, global workspace subunits, and fusion subunits. The encoding subunit is used to analyze the input image data to obtain the encoded result. The interpretable subunit obtains post-interpreted data corresponding to the result encoding based on an interpretable artificial intelligence algorithm. The global workspace sub-unit uses a maximum empirical risk algorithm to compete among the post-interpreted data to obtain unified semantic interpretation representation weight coefficients. The fusion subunit is used to fuse the weight coefficients and the corresponding post-explanation data to obtain the synthetic saliency map and the decision sequence.