A multi-modal large language model machine forgetting method based on causal orthogonal decoupling
By employing a causal orthogonal decoupling method and utilizing dual-scale hierarchical localization and an adaptive geometric erasure strategy, visual-textual associative memories in multimodal large language models are accurately erased, solving the problem of incomplete forgetting in existing technologies and preserving the model's visual perception and reasoning capabilities.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTHEAST UNIV
- Filing Date
- 2026-03-12
- Publication Date
- 2026-06-16
Smart Images

Figure CN121809559B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of deep learning, specifically relating to a machine forgetting method for multimodal large language models based on causal orthogonal decoupling. Background Technology
[0002] Multimodal Large Language Models (MLLMs), as an advanced computational architecture that deeply integrates visual and textual information, demonstrate outstanding understanding and reasoning capabilities in tasks such as image description and visual question answering by mapping image encoding to a language space. However, this high degree of integration of multi-source data also introduces serious data security risks such as privacy leaks and copyright infringements. Machine forgetting technology, as an emerging security paradigm, aims to accurately eliminate the influence of specific sensitive data or harmful concepts from pre-trained models while preserving the model's general cognitive abilities as much as possible. It has become a key technical link in building trustworthy artificial intelligence systems.
[0003] Existing multimodal machine forgetting methods mainly follow the optimization strategies of unimodal models, including maximizing the loss of forgotten samples using gradient ascent or removing high-response parameters through neuron pruning. However, these methods largely ignore the dynamic characteristics of the "progressive fusion" of visual and textual information in large multimodal models. Since mainstream models (such as LLaVA) typically employ an early modality fusion architecture, visual features are not perfectly aligned with the text at the input stage, but rather semantically entangled gradually as the network deepens. Existing techniques often use a globally uniform update strategy, which unnecessarily destroys the integrity of basic visual features at shallow layers and fails to completely sever the strong causal relationship between vision and text at deeper layers. Therefore, how to achieve accurate hierarchical localization and semantic decoupling based on the progressive fusion pattern of multimodal information, ensuring thorough forgetting without compromising the model's general performance, remains a challenging problem that urgently needs to be solved in this field. Summary of the Invention
[0004] To address the aforementioned issues, this invention discloses a machine forgetting method for multimodal large language models based on causal orthogonal decoupling. Based on the progressive fusion of visual and textual features in multimodal large models, it accurately locks key levels through dual-scale localization and implements orthogonal exclusion and sensitivity minimization operations. It also introduces a composite regularization term to anchor the representation of non-target regions. This method can efficiently and accurately erase specific visual-textual association memories without compromising the model's shallow visual perception and general reasoning capabilities.
[0005] To achieve the above objectives, the technical solution of the present invention is as follows:
[0006] A machine forgetting method for multimodal large language models based on causal orthogonal decoupling includes the following steps:
[0007] S1. Construct a forgetting framework for a multimodal large language model; the framework includes: a student model to be optimized, a teacher model with frozen parameters, a forgetting dataset, and a retained dataset;
[0008] S2. Implement macro-level positioning in dual-scale hierarchical positioning, use causal mediation analysis to calculate the fusion influence score, and locate the semantic fusion layer;
[0009] S3. Implement micro-positioning in dual-scale hierarchical positioning, use logical lenses to calculate key sensitivity scores, and locate key sensitive layers;
[0010] S4. Generate a semantically aware dynamic mask to isolate target concept tags from non-target security tags;
[0011] S5. Based on the adaptive geometric erasure strategy, dynamic masking is combined at the localization level to calculate the loss of each component and the total loss function, and optimize the student model parameters to achieve machine forgetting.
[0012] Furthermore, step S2 specifically includes:
[0013] The semantic fusion layer aims to determine the causal hierarchy of visual features leading to the generation of the target text concept. By comparing an input containing the target visual concept with a baseline input replaced with noise (such as a blank image), and by incorporating the visual hidden state into the baseline run, a fusion influence score is calculated. As in equation (1):
[0014] (1);
[0015] In equation (1), This represents the model hierarchy index, and visual features are introduced within these model layers. Indicates the index of the subsequent text processing level that is affected ( ); Indicates the index of the text tag currently being processed; Indicates the first After injecting visual features into the layer, the first Layer The intervention of each marker is hidden; Indicates the baseline input at the 1st Layer The original hidden state of each tag; This represents the cosine similarity calculation function between vectors. A threshold is set based on the fusion influence score. To filter out noise, scores higher than The continuous hierarchy is defined as a set of semantic fusion layers. .
[0016] Furthermore, step S3 specifically includes:
[0017] The localization of key sensitivity layers aims to utilize logical lenses to detect levels where visual features are linearly aligned with target concepts. The hidden states of intermediate layers are projected onto a vocabulary space, and key sensitivity scores are calculated. As in equation (2):
[0018] (2);
[0019] In equation (2), For model-level indexing; For the target concept that needs to be forgotten; This is the set of indices for all image tags in the input. This represents the total number of image tags in the set; Index of the currently processed text tags; For the first Layer The hidden state of each marker; This indicates that the hidden state, after being projected through the logical lens, belongs to the target concept. The probability value; The global calibration factor, defined as the maximum norm of the projected logistic vector in the forgotten dataset, is used to normalize the scale differences between different levels. Levels are ranked according to their critical sensitivity scores, and the top-k levels with the highest scores are selected as the set of critical sensitivity levels. .
[0020] Furthermore, step S4 specifically includes:
[0021] To ensure that the forgetting process does not affect non-target regions, a binary semantic-aware dynamic mask is generated. When the first element in the input sequence... When each label belongs to the target concept set The value is 1; when the flag belongs to a safe area, such as a visual feature or a general instruction context, The value is 0. This mask will be used in the subsequent calculation of the loss function to achieve precise isolation at the label level.
[0022] Furthermore, step S5 specifically includes:
[0023] The adaptive geometric erasure strategy comprises three parts: orthogonal repulsion, gradient response minimization, and composite regularization. The specific calculation process is as follows:
[0024] First, at the semantic fusion layer Inside, calculate the repulsion loss. To sever the connection between visual and textual elements, as in equation (3):
[0025] (3);
[0026] In equation (3), The set of semantic fusion layers determined in step S2; A set of indexes labeled for the target concept; The total number of tags for the target concept; For the student model in the first Layer The hidden state of each marker; For the concept of target The weight vector in the head of the pre-trained language model represents the direction of the concept in the semantic space; This represents the function for calculating the cosine similarity between vectors.
[0027] Secondly, in the key sensitive layer Internally, calculate sensitivity loss. To smooth the decision boundary, as in equation (4):
[0028] (4);
[0029] In equation (4), This refers to the set of key sensitive layers determined in step S3; Indicates the hidden state Find the gradient operator; Represents the log-likelihood probability of the target concept; This represents the L2 norm. A set of indexes labeled for the target concept; The total number of target concept labels; again, calculate the composite regularization term, including the mask feature distillation loss. KL divergence constraint As shown in equations (5) and (6) respectively:
[0030] (5);
[0031] (6);
[0032] In equation (5), For mask The total number of safety tags; and Student models Teacher Model In the Layer The hidden state of each marker. In equation (6), To preserve the dataset; Student models Teacher Model Parameters; Relative entropy, also known as KL divergence, is used to quantify the degree of difference between two probability distributions, and in this method it is specifically used to calculate the student model. Teacher Model Output the deviation of the probability distribution. This function takes two probability distributions as input and calculates a numerical value that represents the similarity between the two distributions. The smaller the value, the higher the similarity between the two probability distributions; the larger the value, the more significant the difference between the two probability distributions. This represents the probability distribution of the model output.
[0033] Finally, the final overall optimization objective function is calculated. As shown in equation (7):
[0034] (7);
[0035] In equation (7), , and These are the hyperparameter weights for the repulsion loss, feature distillation loss, and KL divergence constraint, respectively, used to balance the forgetting effect with the preservation of the model's generality. By minimizing... Update student model parameters.
[0036] The beneficial effects of this invention are as follows:
[0037] This invention discloses a machine forgetting method for multimodal large language models based on causal orthogonal decoupling. First, it employs a dual-scale hierarchical localization strategy, utilizing causal mediation analysis to quantify the causal contribution of visual features to text generation and using logical lenses to detect the linear alignment between visual and textual information. This accurately locks the semantic fusion layer and key sensitive layers to adapt to the progressive fusion characteristics of multimodal information. Second, it implements an adaptive geometric erasure strategy, combining semantically aware dynamic masks to apply repulsion and sensitivity losses within the locked layers, forcing visual features to be orthogonal to the target concept direction and minimizing gradient responses, achieving complete decoupling of visual-text associations. Finally, it anchors the feature representation of non-target safe regions by introducing a composite regularization term combining mask feature distillation and KL divergence constraints. This method effectively solves the problems of incomplete forgetting or impaired model generality caused by neglecting multimodal fusion dynamics in existing technologies. It can efficiently and accurately erase specific visual-text association memories without damaging the model's shallow visual perception and general reasoning capabilities. Attached Figure Description
[0038] Figure 1 This is a schematic diagram of the overall architecture in an embodiment of the present invention;
[0039] Figure 2 This is a flowchart illustrating the dual-scale hierarchical positioning stage in an embodiment of the present invention.
[0040] Figure 3 This is a schematic diagram illustrating the principle of the adaptive geometric erasure stage in an embodiment of the present invention;
[0041] Figure 4 This is a schematic diagram of the composite regularization mechanism in an embodiment of the present invention. Detailed Implementation
[0042] The present invention will be further illustrated below with reference to the accompanying drawings and specific embodiments. It should be understood that the following specific embodiments are for illustrative purposes only and are not intended to limit the scope of the invention.
[0043] Example
[0044] This invention provides a machine forgetting method for multimodal large language models based on causal orthogonal decoupling, such as... Figure 1 As shown, it includes the following steps:
[0045] S1. Construct a multimodal large language model forgetting framework; the framework includes: a student model to be optimized, a teacher model with frozen parameters, a forgotten dataset, and a retained dataset; in this embodiment, the selected multimodal large language model is a model architecture that follows the early modality fusion paradigm, such as the LLaVA-v1.5 series model, which includes a visual encoder, a modality projection layer, and a large language model base.
[0046] When building the framework, two model copies are initialized: one is the student model to be optimized. Its parameters Updated during the forgetting process; another is the teacher model. Its parameters The dataset is kept frozen throughout the process to provide supervisory signals for knowledge distillation and to preserve generality. The dataset is divided into a forgotten dataset. and preserve datasets Forgotten dataset Includes the concept of goals that need to be forgotten. and its corresponding multimodal sample x; retain the dataset It includes samples from non-target domains to maintain the model's general cognitive capabilities.
[0047] S2. Implement macro-level positioning in dual-scale hierarchical positioning, calculate the fusion influence score using causal mediation analysis, and locate the semantic fusion layer; this step aims to macroscopically determine the causal hierarchy range of visual features leading to the generation of target text concepts.
[0048] Specifically, in order to quantify the first This invention performs causal mediation analysis to determine the causal contribution of visual features of a layer to subsequent text generation. For input containing target visual concepts... ,in This represents an image input that contains the visual concept of the target. The input represents a normal text concept, and its first... The visual hidden state of the layer Intervening in the baseline input that is replaced with a blank image During the operation, among them This indicates that the input image will be replaced with a blank image. The input represents a normal text concept. The fusion impact score is calculated by comparing the state differences before and after intervention. As shown in equation (1):
[0049] (1);
[0050] In equation (1), This indicates the current level index for visual feature intervention; Indicates the index of the subsequent text processing level that is affected ( ); Indicates the index of the text tag currently being processed; Indicates the first After injecting visual features into the layer, the first Layer The intervention of each marker is hidden; Indicates the baseline input at the 1st Layer The original hidden state of each tag; This represents the function for calculating the cosine similarity between vectors.
[0051] In this embodiment, for each layer The global score is obtained by aggregating its impact on all subsequent layers. The results show that the shallow layers are typically "shallow silent zones" with relatively little impact, while the dynamic characteristics of multimodal progressive fusion reach high values in the intermediate layers. Therefore, the localization of the semantic fusion layer needs to cover these layers. Thus, a threshold is set... To filter out noise, scores higher than The continuous hierarchy is defined as a set of semantic fusion layers. In the LLaVA model of this embodiment, the semantic fusion layer is typically located in a deep region.
[0052] S3. Implement micro-localization in dual-scale hierarchical localization, using logical lenses to calculate key sensitivity scores and locate key sensitive layers; such as... Figure 2 As shown, this step aims to detect the hierarchical alignment of visual features and target concepts in the vocabulary space at a microscopic level. Using the logistic lens technique, the hidden states of intermediate layers are directly projected onto the vocabulary space of the pre-trained model. To eliminate the scale difference in logit between different layers, a global calibration factor is introduced. Calculate key sensitivity scores As in equation (2):
[0053] (2);
[0054] In equation (2), For model-level indexing; For the target concept that needs to be forgotten; This is the set of indices for all image tags in the input. This represents the total number of image tags in the set; For the specific tag index in the collection; For the first Layer The hidden state of each marker; This indicates that the hidden state, after being projected through the logical lens, belongs to the target concept. The Softmax probability value; The global calibration factor, defined as the maximum norm of the projected logistic vector in the forgotten dataset, is used to normalize the scale differences between different levels. Levels are ranked according to their critical sensitivity scores, and the top-k levels with the highest scores are selected as the set of critical sensitivity levels. In this embodiment, the Top-10 layers are selected as key sensitive layers. These layers represent the areas where the visual features are most aligned with the target text concepts.
[0055] S4. Generate a semantically aware dynamic mask to isolate target concept tags from non-target security tags;
[0056] To ensure that the forgetting process operates precisely on the target concept without affecting the surrounding context, this invention generates a binary semantic-aware dynamic mask. Specifically, for each token in the input sequence Determine whether the tag belongs to the target concept set. If the tag is a concept to be forgotten, the mask value is [value]. When the tag belongs to a secure area (such as a visual feature token, general instruction text, or non-target description), the mask value is [value to be filled in]. This mask will be used in the subsequent calculation of the loss function to achieve precise isolation at the label level.
[0057] S5. Based on the adaptive geometric erasure strategy, dynamic masking is combined at the localization level to calculate the loss of each component and the total loss function, and optimize the student model parameters to achieve machine forgetting.
[0058] This step constructs the overall optimization objective through three parts: orthogonal repulsion, gradient reaction minimization, and composite regularization.
[0059] Part 1: Orthogonal Repulsion
[0060] The semantic fusion layer determined in step S2 Inside, calculate the repulsion loss. The aim is to sever the semantic association between visual features and the target text concept. The calculation formula is as shown in equation (3):
[0061] (3);
[0062] In equation (3), For the student model in the first Layer The hidden state of each marker; For the concept of target The weight vector in the head of the pre-trained language model represents the direction of the concept in the semantic space. This represents the function for calculating the cosine similarity between vectors. The set of semantic fusion layers determined in step S2; A set of indexes labeled for the target concept; The total number of tags for the target concept; such as Figure 3 As shown, by minimizing this orthogonal repulsion loss, the hidden state is forced to be orthogonal to the direction of the target concept.
[0063] Part Two: Gradient Response Minimization
[0064] The key sensitive layer determined in step S3 Internally, calculate sensitivity loss. This is intended to smooth the decision boundary and prevent minor feature perturbations from triggering the target concept. The calculation formula is shown in equation (4):
[0065] (4);
[0066] In equation (4), This refers to the set of key sensitive layers determined in step S3; Indicates the hidden state Find the gradient operator; Represents the log-likelihood probability of the target concept; A set of indexes labeled for the target concept; The total number of tags for the target concept; This represents the L2 norm. For example... Figure 3 As shown, this sensitivity loss minimizes the sensitivity of the target concept probability to changes in the hidden state.
[0067] Part 3: Composite Regularization
[0068] like Figure 4 As shown, in order to preserve the model's generality, a mask feature distillation loss is introduced. KL divergence constraint Mask feature distillation loss is used to anchor non-target regions (i.e., The characteristics of the safe zone are represented as shown in equation (5):
[0069] (5);
[0070] In equation (5), For mask The total number of safety tags; and Student models Teacher Model In the Layer The hidden states of each marker. The KL divergence constraint is used to maintain the stability of the overall output distribution of the model, and is calculated as shown in equation (6):
[0071] (6);
[0072] In equation (6), To preserve the dataset; Student models Teacher Model Parameters; Represents relative entropy; This represents the probability distribution of the model output. Therefore, the overall optimization objective is the weighted sum of the above components, yielding the overall optimization objective function. As shown in equation (7):
[0073] (7);
[0074] In this embodiment, to balance the forgetting effect and the retention of general capabilities, the hyperparameter weight coefficient is set as: repulsion loss weight. Characteristic distillation loss weight KL divergence weights The model was trained using the AdamW optimizer, with LoRA fine-tuning applied to the LLaVA model. This was achieved by minimizing... Update the LoRA parameters of the student model until the model's performance on the forget set meets the preset requirements.
[0075] It should be noted that the above content merely illustrates the technical concept of the present invention and should not be construed as limiting the scope of protection of the present invention. For those skilled in the art, various improvements and modifications can be made without departing from the principle of the present invention, and all such improvements and modifications fall within the scope of protection of the claims of the present invention.
Claims
1. A multi-modal large language model machine forgetting method based on causal orthogonal decoupling, characterized by: Specifically, the following steps are included: S1. Construct a forgetting framework for a multimodal large language model; the framework includes: a student model to be optimized, a teacher model with frozen parameters, a forgetting dataset, and a retained dataset; S2. Implement macro-level positioning in dual-scale hierarchical positioning, use causal mediation analysis to calculate the fusion influence score, and locate the semantic fusion layer; By comparing an input containing the target visual concept with a baseline input replaced with noise, the visual hidden state is incorporated into the baseline run, and the fusion influence score is calculated. As in equation (1): (1); In equation (1), This indicates the current level index for visual feature intervention; Indicates the index of the subsequent text processing level that is affected. ; Indicates the index of the text tag currently being processed; Indicates the first After injecting visual features into the layer, the first Layer The intervention of each marker is hidden; Indicates the baseline input at the 1st Layer The original hidden state of each tag; This function represents the cosine similarity between vectors; a threshold is set based on the fusion influence score. To filter out noise, scores higher than The continuous hierarchy is defined as a set of semantic fusion layers. ; S3. Implement micro-positioning in dual-scale hierarchical positioning, use logical lenses to calculate key sensitivity scores, and locate key sensitive layers; Project the hidden states of the intermediate layer onto the vocabulary space and calculate the key sensitivity score. As in equation (2): (2); In equation (2), For model-level indexing; For the target concept that needs to be forgotten; This is the set of indices for all image tags in the input. This represents the total number of image tags in the set; For the specific tag index in the collection; For the first Layer The hidden state of each marker; This indicates that the hidden state, after being projected through the logical lens, belongs to the target concept. The probability value; The global calibration factor, defined as the maximum norm of the projected logistic vector in the forgotten dataset, is used to normalize the scale differences between different levels. Levels are ranked according to their key sensitivity scores, and the top-k levels with the highest scores are selected as the set of key sensitivity levels. ; S4. Generate a semantically aware dynamic mask to isolate target concept tags from non-target security tags; S5. Based on the adaptive geometric erasure strategy, dynamic masking is combined at the localization level to calculate the loss of each component and the total loss function, and optimize the student model parameters to achieve machine forgetting.
2. The machine forgetting method for multimodal large language models based on causal orthogonal decoupling as described in claim 1, characterized in that: Step S4 specifically involves generating a binary semantic-aware dynamic mask to ensure that the forgetting process does not affect non-target regions. When the first element in the input sequence... When each label belongs to the target concept set The value is 1; when the flag belongs to a safe area, such as a visual feature or a general instruction context, The value is 0; this mask will be used for subsequent loss function calculations to achieve precise isolation at the label level.
3. The machine forgetting method for multimodal large language models based on causal orthogonal decoupling as described in claim 1, characterized in that: Step S5 specifically involves the adaptive geometric erasure strategy, which comprises three parts: orthogonal repulsion, gradient response minimization, and composite regularization. Firstly, in the semantic fusion layer... Inside, calculate the repulsion loss. To sever the connection between visual and textual elements, firstly, at the semantic fusion layer... Inside, calculate the repulsion loss. To sever the connection between visual and textual elements, as in equation (3): (3); In equation (3), The set of semantic fusion layers determined in step S2; A set of indexes labeled for the target concept; The total number of tags for the target concept; For the student model in the first Layer The hidden state of each marker; For the concept of target The weight vector in the head of the pre-trained language model represents the direction of the concept in the semantic space; A function for calculating the cosine similarity between vectors; Secondly, in the key sensitive layer Internally, calculate sensitivity loss. To smooth the decision boundary, as in equation (4): (4); In equation (4), This refers to the set of key sensitive layers determined in step S3; Indicates the hidden state Find the gradient operator; Represents the log-likelihood probability of the target concept; Represents the L2 norm; A set of indexes labeled for the target concept; The total number of target concept labels; again, calculate the composite regularization term, including the mask feature distillation loss. KL divergence constraint As shown in equations (5) and (6) respectively: (5); (6); In equation (5), For mask The total number of safety tags; and Student models Teacher Model In the Layer The hidden state of each marker; in equation (6), To preserve the dataset; Student models Teacher Model Parameters; Represents relative entropy; This represents the probability distribution of the model output; Finally, the final overall optimization objective function is calculated. As shown in equation (7): (7); In equation (7), , and These are the hyperparameter weights for the repulsion loss, feature distillation loss, and KL divergence constraint, respectively, used to balance the forgetting effect with the preservation of the model's generality; by minimizing Update student model parameters.