Content risk identification and intervention method based on deep learning large model generation

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By integrating a multimodal perception model with the Cheetah optimization algorithm, the problem of insufficient multimodal risk identification in existing technologies has been solved, enabling high-precision identification and intelligent intervention of content generated by large models, thereby improving content security and system intelligence.

CN120995204BActive Publication Date: 2026-06-26GUANGXI POLICE ACAD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: GUANGXI POLICE ACAD
Filing Date: 2025-08-06
Publication Date: 2026-06-26

Application Information

Patent Timeline

06 Aug 2025

Application

26 Jun 2026

Publication

CN120995204B

IPC: G06F40/284

CPC: G06F18/241; G06F18/253; G06F18/2137; G06N3/08; G06N5/045

AI Tagging

Technology Topics

Feature codingPerception model

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Industrial residual life prediction method based on hypergraph embedding space-time decoupling mamba network
CN122262876ABiological modelsMissing dataData set
A method for predicting MOF chemical warfare agent adsorption performance based on a bidirectional attention mechanism
CN122266538AChemical property prediction Biological models Data set Chemical physics
Oil and gas exploration road image segmentation model training method and device, equipment and medium
CN122264004AImprove training accuracyeasy to understand Scene recognition Neural learning methodsFeature codingImage segmentation
Low-latitude ionospheric total electron content prediction method based on equatorial ionization anomaly feature coding
CN121919631BBiological models Feature vectorLow latitude
A bearing fault diagnosis method and system based on time-frequency dual modal
CN122262786AMachine part testing Biological models Time domainBi modal

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing risk identification methods mainly focus on single-modal content analysis and lack a comprehensive understanding of multimodal fusion features. This results in insufficient ability to identify risks with complex semantics, implicit ambiguity, or cross-modal combinations. At the same time, existing intervention strategies lack the ability to fine-grained match between risk levels and content semantics, making it difficult to adapt to the diversity and dynamic changes of risk expressions in content generated by large models.

Method used

We employ a deep learning-based method for identifying and intervening in content generated by a large model. This method integrates a general multimodal perception model with the Cheetah optimization algorithm. Through multimodal data acquisition, feature encoding, global risk feature extraction, dynamic modality focusing, and contextual consistency judgment, we construct a closed-loop optimization system. We also introduce a modality adaptation mechanism and a risk feature interpretability analysis module to achieve high-precision identification and intelligent intervention for complex generated content.

Benefits of technology

It enhances the ability of multimodal perception models to identify potential risks in complex generated content, achieves high-precision risk identification and reasonable intervention, strengthens the adaptability and interpretability of the model, and improves the security of generated content and the intelligence level of application systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN120995204B_ABST

Patent Text Reader

Abstract

The application discloses a large model generation content risk identification and intervention method based on deep learning, comprising the following steps: collecting multi-modal data output by a content generation system and pre-processing to generate pre-processed multi-modal input data; multi-modal perception model feature coding and fusion to generate global risk feature representation data; inputting the global risk feature into a cheetah optimization algorithm to optimize model structure parameters, risk threshold and intervention strategy parameters; driving risk identification and hierarchical intervention by optimized parameters to output risk level, category and hierarchical intervention measures; collecting intervention effects and user feedback to drive continuous optimization and adaptive evolution of parameters. The application realizes efficient risk identification and intelligent intervention of large model generated content, and significantly improves the accuracy and automation level of content security management.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method for risk identification and intervention of content generated by large models based on deep learning. Background Technology

[0002] With the rapid development of artificial intelligence technology, large-model content generation (AIGC) has been widely applied in various scenarios such as text generation, image synthesis, speech-to-text transcription, and video generation. In particular, the development of multimodal large language models has enabled content generation systems to integrate multi-source inputs and automatically output highly realistic and semantically rich text and visual content. However, large-model content generation presents potential risks, such as generating false information, inappropriate content, ambiguous expressions, or even prohibited content, which greatly challenges the ability to conduct content review and risk control.

[0003] Existing risk identification methods primarily focus on single-modal content analysis, lacking a comprehensive understanding of multimodal fusion characteristics. This results in insufficient ability to identify risks with complex semantics, implicit ambiguity, or cross-modal combinations. Furthermore, most multimodal perception model structures employ static parameter configurations, making it difficult to adapt to the diverse and dynamically changing risk expressions in large-scale model-generated content. Existing intervention strategies largely rely on manually set rules, lacking fine-grained matching capabilities between risk levels and content semantics, and failing to establish a closed-loop optimization mechanism.

[0004] Therefore, how to provide a method for risk identification and intervention of content generated by large deep learning models is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0005] One objective of this invention is to propose a method for risk identification and intervention in large-scale model-generated content based on deep learning. This invention fully integrates a general multimodal perception model and the Cheetah Optimization algorithm, detailing the complete process from multimodal data acquisition, feature encoding, global risk feature extraction, dynamic modality focusing, contextual consistency judgment, to risk level output and intervention strategy selection. By introducing a modality adaptation mechanism, a contextual consistency enhancement module, and a risk feature interpretability analysis module, the multimodal perception model's ability to identify potential risks in complex generated content is improved. An improved Cheetah Optimization algorithm is used to dynamically optimize model structure parameters, risk thresholds, and intervention strategy parameters, constructing a closed-loop evolutionary optimization system. This system possesses advantages such as high risk identification accuracy, strong intervention decision intelligence, and strong adaptive evolutionary capabilities, effectively improving the security and compliance of large-scale model-generated content in practical applications.

[0006] The method for identifying and intervening in content risks generated by large models based on deep learning according to embodiments of the present invention includes:

[0007] Collect target content to generate multimodal data output by the system, preprocess the multimodal data, and generate preprocessed multimodal input data;

[0008] Based on multimodal input data, a multimodal perception model is used for feature encoding, extracting features of each modality and performing cross-modal feature fusion to generate global risk feature representation data;

[0009] Using global risk feature representation data as input, the structural parameters, risk thresholds, and intervention strategy parameters of the multimodal perception model are set as variables to be optimized. The population of the Cheetah Optimization Algorithm is initialized, and the search space and fitness function of the Cheetah Optimization Algorithm are set. The Cheetah Optimization Algorithm is used for global optimization, and the optimal combination of parameters with the best fitness is obtained through multiple rounds of iteration.

[0010] Based on optimized parameter combinations, an optimized multimodal perception model is used to identify risks in multimodal input data, output risk level and risk category results data, and automatically select and execute corresponding graded intervention measures based on the risk level and risk category results data.

[0011] Data on the actual effects of risk intervention measures and user feedback are collected and used as inputs for fitness assessment into the Cheetah optimization algorithm to achieve continuous optimization and adaptive evolution of the multimodal perception model's structural parameters, risk thresholds, and intervention strategy parameters.

[0012] Optionally, the multimodal data specifically includes text data, image data, audio data, and video data.

[0013] Optionally, the preprocessing of multimodal data specifically includes format unification, noise filtering, segmentation, and feature normalization of text data, image data, audio data, and video data.

[0014] Optionally, the generation of global risk feature representation data includes:

[0015] Based on multimodal input data, a multimodal perception model is constructed, which includes a feature encoder, an adaptive modality missing completion module, and a hierarchical progressive fusion layer structure.

[0016] For each mode in the multimodal input data, feature extraction is performed using the corresponding feature encoder to obtain the modal features of each mode;

[0017] After feature extraction, an adaptive modality missing completion module is employed. This module comprises a modality state detection unit, a missing feature completion unit, and a completed feature fusion unit, specifically:

[0018] The modal state detection unit determines the input state of each modal feature. If a modal feature is detected as missing or the confidence level is lower than the set modal state discrimination threshold, the input state is determined. If the missing feature completion unit is triggered, the missing feature completion unit takes other available modal features as input, automatically learns the feature association relationship between different modalities through the completion network, and infers the completion feature of the missing modality based on the feature information of the remaining effective modalities. The completion network consists of a multi-layer fully connected structure and ensures the consistency between the completion feature and the actual feature distribution through end-to-end training.

[0019] The completion feature fusion unit combines the completion features with the normal modality features to obtain a complete multimodal feature set;

[0020] Based on a complete multimodal feature set, a hierarchical progressive fusion layer structure is adopted. The hierarchical progressive fusion layer structure includes a hierarchical feature fusion unit, a progressive information transmission channel, and an output fusion unit, specifically:

[0021] The multimodal feature sets are sequentially input into low-level, mid-level, and high-level fusion units. Low-level fusion uses a concatenation operation, mid-level fusion uses weighted fusion, and high-level fusion uses dimensionally gated fusion with gating weights of 1 / 2. ;

[0022] The progressive information transmission channel connects the fusion output of each layer with the main fusion stream, enabling the transmission and retention of multi-scale feature information;

[0023] The output fusion unit processes the outputs of each fusion layer according to the fusion weight parameters. The weighted aggregation, combined with the set risk threshold parameters and intervention strategy parameters, generates global risk feature representation data, which is then output.

[0024] Optionally, obtaining the optimal combination of parameters with the best fitness through multiple rounds of iteration includes:

[0025] The obtained global risk feature representation data is used as the optimization input, and the risk threshold parameter is used as the input. Intervention strategy parameters Modal state discrimination threshold and gating weights Set them as variables to be optimized, forming a set of parameters to be optimized. ;

[0026] Initialize the cheetah optimization algorithm population, and set the population size to be [missing information]. Each individual Encode as a set of parameters to be optimized And set the search space range and fitness function;

[0027] During the population search process, the cheetah optimization algorithm includes tracking, pursuit, and jumping behaviors, and adds cooperative encirclement and adaptive rest behaviors, specifically:

[0028] When several individual cheetahs are within a search space that is less than the cooperation threshold Furthermore, when in a high-fitness area, cooperative hunting behavior is triggered, with the cheetahs exhibiting reduced weighting. Calculate a weighted average based on the current position;

[0029] When a cheetah individual is in continuous If intragenerational fitness does not improve, rest probability is used. Trigger adaptive rest behavior to migrate the individual's position to a random new position in the search space;

[0030] Each individual is updated in each generation based on tracking, pursuit, jumping, cooperative encirclement, and adaptive rest behaviors to form a new generation of population;

[0031] After completing multiple rounds of population position and parameter updates and fitness value evaluation, the optimal parameter combination with global fitness is obtained when the set maximum number of iterations is reached or the fitness function converges.

[0032] The optimal parameter combination is output, including the optimized risk threshold parameters, intervention strategy parameters, modal state discrimination threshold and gating weight, and is used as the parameter configuration for risk identification and intervention execution.

[0033] Optionally, the output risk level and risk category result data includes:

[0034] The obtained optimal parameter combination is applied to the risk identification and intervention configuration of the multimodal perception model, and the optimized multimodal perception model parameters are loaded.

[0035] The multimodal input data and global risk feature representation data are input into the optimized multimodal perception model. A dynamic modal feature focusing mechanism is then employed, based on the gating weights. The modality adaptive weight generator focuses and filters the features of each modality to obtain the focused feature vector;

[0036] The focused feature vector is input into the context consistency discrimination module, which consists of a context feature extraction unit, a feature fusion unit, and a consistency discrimination unit. The context feature extraction unit is used to extract context features of historical inputs, generated fragments, and semantically related data related to the content generation process. The feature fusion unit is used to fuse the context features with the focused feature vector. The consistency discrimination unit is used to perform consistency analysis and discrimination on the fused features and output consistency enhancement features.

[0037] A risk classification and discrimination method based on multiple threshold intervals is adopted. Consistency enhancement features are processed through a multilayer perceptron to output risk-related score vectors. Finally, the score vectors are mapped to risk output scores in the final output layer. According to the optimal risk threshold parameter Risk level assessment:

[0038] ;

[0039] in, This indicates the final risk level classification, with 0 representing low risk, 1 representing medium risk, and 2 representing high risk. This represents the first risk threshold cutoff point obtained by the Cheetah optimization algorithm. This represents the second risk threshold dividing point obtained by the Cheetah optimization algorithm;

[0040] The risk level assessment results are input into the risk feature interpretability analysis module, which consists of a feature importance extraction unit, a contribution calculation unit, and an interpretation matrix generation unit. The feature importance extraction unit identifies the contribution of each modality and feature channel during the assessment process, the contribution calculation unit scores the importance of the identified features, and the interpretation matrix generation unit generates a risk cause interpretation matrix based on the contribution scores.

[0041] Based on the risk level category, optimal intervention strategy parameters, and risk cause explanation matrix, the system automatically selects and executes tiered intervention measures, including content identification, risk warning, content blocking, content replacement, manual review, and interpretable report output. The response intensity and triggering conditions are dynamically adjusted according to the optimal intervention strategy parameters, and the final risk level, risk category, risk cause explanation matrix, and intervention measure execution results are output.

[0042] Optionally, the actual effect data and user feedback results of the collected risk intervention measures are used as inputs to the Cheetah optimization algorithm for fitness assessment, enabling continuous optimization and adaptive evolution of the multimodal perception model's structural parameters, risk thresholds, and intervention strategy parameters, including:

[0043] Collect multi-dimensional intervention effect data on content changes resulting from implemented intervention measures, risk re-identification results, and user behavior responses; collect user feedback results, including user ratings, comment sentiment, report frequency, and interaction behavior information.

[0044] The intervention effect data and user feedback results are standardized and encoded to construct a feedback feature vector;

[0045] The feedback feature vector is input into the Cheetah optimization algorithm. Combined with the current multimodal perception model structural parameters, risk threshold, and intervention strategy parameters as variables to be optimized, fitness assessment is performed. Based on the fitness assessment results, the numerical combination of the variables to be optimized is updated to form the optimized parameter scheme.

[0046] Within a preset optimization execution cycle, based on continuously received feedback feature vectors, the structural parameters, risk thresholds, and intervention strategy parameters of the multimodal perception model are periodically updated to achieve adaptive optimization and evolution of the multimodal perception model's performance.

[0047] The beneficial effects of this invention are:

[0048] This invention constructs a content risk identification and intervention method with high accuracy, strong adaptability, and good interpretability by integrating a general multimodal perception model with an improved Cheetah optimization algorithm. In terms of multimodal perception, a modality adaptive weight generation mechanism and a contextual consistency discrimination module are introduced, effectively improving the model's expressive power and discrimination accuracy when processing complex heterogeneous generated data. Especially in scenarios with incomplete multimodal information, fragmented context, or semantic ambiguity, it can still stably output high-confidence risk level results. Through a risk feature interpretability analysis module, the system can clearly indicate the contribution of each modality and its internal feature channels in risk discrimination, enhancing the transparency and user understandability of the algorithm's decision-making.

[0049] Regarding parameter optimization, this invention constructs a three-stage collaborative search mechanism and a feedback-driven fitness evaluation system based on the Cheetah Optimization Algorithm. This not only enhances the global search capability but also enables dynamic updating and adaptive evolution of model structure parameters, risk thresholds, and intervention strategy parameters, ensuring the system maintains optimal risk identification and intervention response capabilities across various content scenarios. Compared to existing methods, this invention achieves more accurate identification, more reasonable intervention, and more efficient evolutionary optimization of risks in large-scale model-generated content, effectively improving the security of generated content and the intelligence level of the application system. Attached Figure Description

[0050] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings:

[0051] Figure 1 This is a flowchart of the deep learning-based large model-generated content risk identification and intervention method proposed in this invention.

[0052] Figure 2 This is a schematic diagram of the structure of the multimodal perception model for the content risk identification and intervention method based on deep learning proposed in this invention. Detailed Implementation

[0053] The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.

[0054] refer to Figure 1 and Figure 2 Methods for content risk identification and intervention based on large deep learning models include:

[0055] Collect target content to generate multimodal data output by the system, preprocess the multimodal data, and generate preprocessed multimodal input data;

[0056] Based on multimodal input data, a multimodal perception model is used for feature encoding, extracting features of each modality and performing cross-modal feature fusion to generate global risk feature representation data;

[0057] Using global risk feature representation data as input, the structural parameters, risk thresholds, and intervention strategy parameters of the multimodal perception model are set as variables to be optimized. The population of the Cheetah Optimization Algorithm is initialized, and the search space and fitness function of the Cheetah Optimization Algorithm are set. The Cheetah Optimization Algorithm is used for global optimization, and the optimal combination of parameters with the best fitness is obtained through multiple rounds of iteration.

[0058] Based on optimized parameter combinations, an optimized multimodal perception model is used to identify risks in multimodal input data, output risk level and risk category results data, and automatically select and execute corresponding graded intervention measures based on the risk level and risk category results data.

[0059] Data on the actual effects of risk intervention measures and user feedback are collected and used as inputs for fitness assessment into the Cheetah optimization algorithm to achieve continuous optimization and adaptive evolution of the multimodal perception model's structural parameters, risk thresholds, and intervention strategy parameters.

[0060] In this embodiment, the multimodal data specifically includes text data, image data, audio data, and video data.

[0061] In this embodiment, the preprocessing of multimodal data specifically includes format unification, noise filtering, segmentation, and feature normalization of text data, image data, audio data, and video data.

[0062] In this embodiment, generating global risk feature representation data includes:

[0063] Based on multimodal input data, a multimodal perception model is constructed, which includes a feature encoder, an adaptive modality missing completion module, and a hierarchical progressive fusion layer structure.

[0064] For each mode in the multimodal input data, feature extraction is performed using the corresponding feature encoder to obtain the modal features of each mode;

[0065] After feature extraction, an adaptive modality missing completion module is employed. This module comprises a modality state detection unit, a missing feature completion unit, and a completed feature fusion unit, specifically:

[0066] The modal state detection unit determines the input state of each modal feature. If a modal feature is detected as missing or the confidence level is lower than the set modal state discrimination threshold, the input state is determined. If the missing feature completion unit is triggered, the missing feature completion unit takes other available modal features as input, automatically learns the feature association relationship between different modalities through the completion network, and infers the completion feature of the missing modality based on the feature information of the remaining effective modalities. The completion network consists of a multi-layer fully connected structure and ensures the consistency between the completion feature and the actual feature distribution through end-to-end training.

[0067] The completion feature fusion unit combines the completion features with the normal modality features to obtain a complete multimodal feature set;

[0068] Based on a complete multimodal feature set, a hierarchical progressive fusion layer structure is adopted. The hierarchical progressive fusion layer structure includes a hierarchical feature fusion unit, a progressive information transmission channel, and an output fusion unit, specifically:

[0069] The multimodal feature sets are sequentially input into low-level, mid-level, and high-level fusion units. Low-level fusion uses a concatenation operation, mid-level fusion uses weighted fusion, and high-level fusion uses dimensionally gated fusion with gating weights of 1 / 2. ;

[0070] The progressive information transmission channel connects the fusion output of each layer with the main fusion stream, enabling the transmission and retention of multi-scale feature information;

[0071] The output fusion unit processes the outputs of each fusion layer according to the fusion weight parameters. The weighted aggregation, combined with the set risk threshold parameters and intervention strategy parameters, generates global risk feature representation data, which is then output.

[0072] In this embodiment, obtaining the optimal parameter combination with the best fitness through multiple rounds of iteration includes:

[0073] The obtained global risk feature representation data is used as the optimization input, and the risk threshold parameter is used as the input. Intervention strategy parameters Modal state discrimination threshold and gating weights Set them as variables to be optimized, forming a set of parameters to be optimized. ;

[0074] Initialize the cheetah optimization algorithm population, and set the population size to [value missing]. Each individual Encode as a set of parameters to be optimized And set the search space range and fitness function:

[0075] ;

[0076] in, This represents the risk identification accuracy based on the parameter set to be optimized. The score represents the effectiveness of the intervention strategy. This indicates the volatility of the model's risk output. Indicates computational resource consumption. For the first Adaptive weights;

[0077] During the population search process, the cheetah optimization algorithm includes tracking, pursuit, and jumping behaviors, and adds cooperative encirclement and adaptive rest behaviors, specifically:

[0078] When several individual cheetahs are within a search space that is less than the cooperation threshold Furthermore, when in a high-fitness area, cooperative hunting behavior is triggered, with the cheetahs exhibiting reduced weighting. Calculate a weighted average based on the current position;

[0079] When a cheetah individual is in continuous If intragenerational fitness does not improve, rest probability is used. Trigger adaptive rest behavior to migrate the individual's position to a random new position in the search space;

[0080] Each individual is updated in each generation based on tracking, pursuit, jumping, cooperative encirclement, and adaptive rest behaviors to form a new generation of population;

[0081] After completing multiple rounds of population position and parameter updates and fitness value evaluation, the optimal parameter combination with global fitness is obtained when the set maximum number of iterations is reached or the fitness function converges.

[0082] The optimal parameter combination is output, including the optimized risk threshold parameters, intervention strategy parameters, modal state discrimination threshold and gating weight, and is used as the parameter configuration for risk identification and intervention execution.

[0083] In this embodiment, the output risk level and risk category result data includes:

[0084] The obtained optimal parameter combination is applied to the risk identification and intervention configuration of the multimodal perception model, and the optimized multimodal perception model parameters are loaded.

[0085] The multimodal input data and global risk feature representation data are input into the optimized multimodal perception model. A dynamic modal feature focusing mechanism is then employed, based on the gating weights. The modality adaptive weight generator focuses and filters the features of each modality to obtain the focused feature vector;

[0086] The focused feature vector is input into the context consistency discrimination module, which consists of a context feature extraction unit, a feature fusion unit, and a consistency discrimination unit. The context feature extraction unit is used to extract context features of historical inputs, generated fragments, and semantically related data related to the content generation process. The feature fusion unit is used to fuse the context features with the focused feature vector. The consistency discrimination unit is used to perform consistency analysis and discrimination on the fused features and output consistency enhancement features.

[0087] A risk classification and discrimination method based on multiple threshold intervals is adopted. Consistency enhancement features are processed through a multilayer perceptron to output risk-related score vectors. Finally, the score vectors are mapped to risk output scores in the final output layer. According to the optimal risk threshold parameter Risk level assessment:

[0088] ;

[0089] in, This indicates the final risk level classification, with 0 representing low risk, 1 representing medium risk, and 2 representing high risk. This represents the first risk threshold cutoff point obtained by the Cheetah optimization algorithm. This represents the second risk threshold dividing point obtained by the Cheetah optimization algorithm;

[0090] The risk level assessment results are input into the risk feature interpretability analysis module, which consists of a feature importance extraction unit, a contribution calculation unit, and an interpretation matrix generation unit. The feature importance extraction unit identifies the contribution of each modality and feature channel during the assessment process, the contribution calculation unit scores the importance of the identified features, and the interpretation matrix generation unit generates a risk cause interpretation matrix based on the contribution scores.

[0091] Based on the risk level category, optimal intervention strategy parameters, and risk cause explanation matrix, the system automatically selects and executes tiered intervention measures, including content identification, risk warning, content blocking, content replacement, manual review, and interpretable report output. The response intensity and triggering conditions are dynamically adjusted according to the optimal intervention strategy parameters, and the final risk level, risk category, risk cause explanation matrix, and intervention measure execution results are output.

[0092] In this embodiment, the actual effect data of the risk intervention measures and user feedback results are collected and used as inputs to the Cheetah optimization algorithm for fitness assessment, thereby achieving continuous optimization and adaptive evolution of the multimodal perception model's structural parameters, risk thresholds, and intervention strategy parameters, including:

[0093] Collect multi-dimensional intervention effect data on content changes resulting from implemented intervention measures, risk re-identification results, and user behavior responses; collect user feedback results, including user ratings, comment sentiment, report frequency, and interaction behavior information.

[0094] The intervention effect data and user feedback results are standardized and encoded to construct a feedback feature vector;

[0095] The feedback feature vector is input into the Cheetah optimization algorithm. Combined with the current multimodal perception model structural parameters, risk threshold, and intervention strategy parameters as variables to be optimized, fitness assessment is performed. Based on the fitness assessment results, the numerical combination of the variables to be optimized is updated to form the optimized parameter scheme.

[0096] Within a preset optimization execution cycle, based on continuously received feedback feature vectors, the structural parameters, risk thresholds, and intervention strategy parameters of the multimodal perception model are periodically updated to achieve adaptive optimization and evolution of the multimodal perception model's performance.

[0097] Example 1:

[0098] To verify the feasibility of this invention in practice, it was applied to a large social content platform with tens of millions of registered users. Users generate multimodal content daily, including text, images, and short videos, through a large model. In recent years, the platform has observed a trend towards multimodal and concealed content, misinformation, and sensitive semantics. Traditional content risk identification systems based on keyword filtering and manual review often suffer from missed detections, false judgments, and slow response times. For example, in the platform's statistics for 2024, the daily content volume exceeded 150,000 pieces, with a harmful information miss rate as high as 17%. The average manual review time for high-risk content exceeded 18 minutes, and users repeatedly reported low satisfaction with content security and risk warnings. Content security has become a core pain point for the platform's development.

[0099] To address the aforementioned issues, the platform officially deployed the deep learning-based large-model content risk identification and intervention system proposed in this invention in April 2025. After deployment, the system first automatically collects and standardizes the multimodal content newly generated by platform users daily. Then, it automatically extracts deep features from text, images, audio, and video using a multimodal perception model. Employing dynamic modal feature focusing and contextual consistency judgment, it automatically identifies potential risk points. The model intelligently distinguishes the weight of each modality in risk assessment and, in conjunction with the Cheetah Optimization Algorithm, dynamically adjusts risk thresholds and intervention parameters to ensure the system can adapt to continuous changes in content format and risk performance.

[0100] After risk identification, the platform system intelligently allocates intervention measures based on the content's risk level. For example, high-risk content is automatically removed and alerts are sent to relevant departments; medium-risk content is communicated to users via pop-up windows, guiding them to conduct self-checks; and low-risk content is left untouched, serving only as samples for model self-learning. The platform system also continuously collects user feedback on risk alerts and intervention results, as well as reports, using this feedback as input for the Cheetah Optimization Algorithm's fitness assessment, enabling periodic adaptive optimization of the system's model parameters.

[0101] Table 1. Comparison of the effects of the You Community Content Security Identification and Intervention System before and after deployment.

[0102] index Before deployment (January-March 2025) After deployment (April-June 2025) Increase Average daily content volume (items) 148000 176000 +18.9% Accuracy rate of harmful content identification (%) 83.0 96.5 +13.5% Response time (minutes) for manual review of high-risk content 18.2 1.7 -90.7% User satisfaction with content risk warnings (%) 69.3 91.6 +22.3% Content moderation false alarm rate (%) 5.7 1.2 -4.5% Content review underreporting rate (%) 17.0 3.1 -13.9% Relapse rate (%) after intervention measures are implemented 11.2 2.4 -8.8%

[0103] Based on the data in Table 1, it is clear that the application of this invention has brought about a significant systemic improvement. With an overall increase in platform content activity, the average daily content volume increased from 148,000 items before deployment to 176,000 items after deployment, representing an increase of 18.9%. While the amount of content increased, the accuracy rate of identifying harmful content significantly improved from 83.0% to 96.5%, an increase of 13.5 percentage points. This indicates that the multimodal deep learning model and intelligent optimization mechanism significantly enhanced the platform's ability to identify complex and risky content.

[0104] The response time for manual review of high-risk content was reduced from 18.2 minutes to 1.7 minutes, a reduction of 90.7%, significantly improving the platform's emergency response efficiency for sensitive content. User satisfaction with content risk warnings also increased from 69.3% to 91.6%, a rise of 22.3 percentage points, reflecting users' high recognition of the platform's content governance capabilities and security. Meanwhile, the false positive rate for content review decreased from 5.7% to 1.2%, and the false negative rate decreased from 17.0% to 3.1%, both showing significant declines. This indicates that the system effectively reduced the pressure on manual review and the omission of content risks. The recurrence rate after intervention measures were implemented also decreased from 11.2% to 2.4%, a reduction of 8.8%, demonstrating a significant improvement in the targeted nature and long-term effectiveness of the intervention measures.

[0105] The method of this invention has advantages in improving the accuracy of content risk identification, intervention efficiency, and user satisfaction. It not only enhances the platform's intelligent governance capabilities but also provides a practical reference for risk management in similar large-scale model-generated content scenarios.

[0106] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

Claims

1. A method for content risk identification and intervention based on large-scale deep learning models, characterized in that, include: Collect target content to generate multimodal data output by the system, including text data, image data, audio data, and video data. Preprocess the multimodal data to generate preprocessed multimodal input data. Based on multimodal input data, a multimodal perception model is used for feature encoding, extracting features from each modality and performing cross-modal feature fusion to generate global risk feature representation data, including: A multimodal perception model is constructed, comprising a feature encoder, an adaptive modality missing completion module, and a hierarchical progressive fusion layer structure. For each modality in the multimodal input data, a corresponding feature encoder is used for feature extraction to obtain the modal features of each modality. The adaptive modality missing completion module includes a modality state detection unit, a missing feature completion unit, and a completion feature fusion unit. The modality state detection unit determines the input state of each modality feature. If a missing modality feature is detected or the confidence level is lower than the set modality state discrimination threshold τmiss, the missing feature completion unit is triggered. The missing feature completion unit takes other currently available modality features as input and automatically learns the feature relationships between different modalities through the completion network. Based on the feature information of the remaining valid modalities, it infers the completion features of the missing modality. The completion network consists of multiple... The system consists of a fully connected layer structure. A feature fusion unit combines the completed features with the normal modality features to obtain a complete multimodal feature set. The hierarchical progressive fusion layer structure includes a layered feature fusion unit, a progressive information transmission channel, and an output fusion unit. The multimodal feature set is sequentially input into the low-level, mid-level, and high-level fusion units. Low-level fusion uses a concatenation operation, mid-level fusion uses weighted fusion, and high-level fusion uses dimension-wise gated fusion with a gate weight of gi. The progressive information transmission channel connects the fusion outputs of each layer to the main fusion stream, enabling the transmission and retention of multi-scale feature information. The output fusion unit weights and aggregates the outputs of each fusion layer according to the fusion weight parameter λl, and combines this with the set risk threshold parameters and intervention strategy parameters to generate global risk feature representation data, which is then output. Using global risk feature representation data as input, the structural parameters, risk thresholds, and intervention strategy parameters of the multimodal perception model are set as variables to be optimized. The population of the Cheetah Optimization Algorithm is initialized, and the search space and fitness function of the Cheetah Optimization Algorithm are set. The Cheetah Optimization Algorithm is used for global optimization, and the optimal combination of parameters with the best fitness is obtained through multiple rounds of iteration. Based on optimized parameter combinations, an optimized multimodal perception model is used to identify risks in multimodal input data, output risk level and risk category results data, and automatically select and execute corresponding graded intervention measures based on the risk level and risk category results data. Data on the actual effects of risk intervention measures and user feedback are collected and used as inputs for fitness assessment into the Cheetah optimization algorithm to achieve continuous optimization and adaptive evolution of the multimodal perception model's structural parameters, risk thresholds, and intervention strategy parameters.

2. The method for content risk identification and intervention based on large models generated by deep learning according to claim 1, characterized in that, The preprocessing of multimodal data specifically includes format unification, noise filtering, segmentation, and feature normalization of text, image, audio, and video data.

3. The method for content risk identification and intervention based on large models generated by deep learning according to claim 1, characterized in that, The process of obtaining the optimal combination of parameters with the best fitness through multiple rounds of iteration includes: The obtained global risk feature representation data is used as the optimization input, and the risk threshold parameter is used as the input. Intervention strategy parameters Modal state discrimination threshold and gating weights Set them as variables to be optimized, forming a set of parameters to be optimized. ; Initialize the cheetah optimization algorithm population, and set the population size to be [missing information]. Each individual Encode as a set of parameters to be optimized And set the search space range and fitness function; During the population search process, the cheetah optimization algorithm includes tracking, pursuit, and jumping behaviors, and adds cooperative encirclement and adaptive rest behaviors, specifically: When several individual cheetahs are within a search space that is less than the cooperation threshold Furthermore, when in a high-fitness area, cooperative hunting behavior is triggered, with the cheetahs exhibiting reduced weighting. Calculate a weighted average based on the current position; When a cheetah individual is in continuous If intragenerational fitness does not improve, rest probability is used. Trigger adaptive rest behavior to migrate the individual's position to a random new position in the search space; Each individual is updated in each generation based on tracking, pursuit, jumping, cooperative encirclement, and adaptive rest behaviors to form a new generation of population; After completing multiple rounds of population position and parameter updates and fitness value evaluation, the optimal parameter combination with global fitness is obtained when the set maximum number of iterations is reached or the fitness function converges. The optimal parameter combination is output, including the optimized risk threshold parameters, intervention strategy parameters, modal state discrimination threshold and gating weight, and is used as the parameter configuration for risk identification and intervention execution.

4. The method for content risk identification and intervention based on large models generated by deep learning according to claim 1, characterized in that, The output risk level and risk category results data include: The obtained optimal parameter combination is applied to the risk identification and intervention configuration of the multimodal perception model, and the optimized multimodal perception model parameters are loaded. The multimodal input data and global risk feature representation data are input into the optimized multimodal perception model. A dynamic modal feature focusing mechanism is then employed, based on the gating weights. The modality adaptive weight generator focuses and filters the features of each modality to obtain the focused feature vector; The focused feature vector is input into the context consistency discrimination module, which consists of a context feature extraction unit, a feature fusion unit, and a consistency discrimination unit. The context feature extraction unit is used to extract context features of historical inputs, generated fragments, and semantically related data related to the content generation process. The feature fusion unit is used to fuse the context features with the focused feature vector. The consistency discrimination unit is used to perform consistency analysis and discrimination on the fused features and output consistency enhancement features. A risk classification and discrimination method based on multiple threshold intervals is adopted. Consistency enhancement features are processed through a multilayer perceptron to output risk-related score vectors. Finally, the score vectors are mapped to risk output scores in the final output layer. According to the optimal risk threshold parameter Risk level assessment: ； in, This indicates the final risk level classification, with 0 representing low risk, 1 representing medium risk, and 2 representing high risk. This represents the first risk threshold cutoff point obtained by the Cheetah optimization algorithm. This represents the second risk threshold dividing point obtained by the Cheetah optimization algorithm; The risk level assessment results are input into the risk feature interpretability analysis module, which consists of a feature importance extraction unit, a contribution calculation unit, and an interpretation matrix generation unit. The feature importance extraction unit identifies the contribution of each modality and feature channel during the assessment process, the contribution calculation unit scores the importance of the identified features, and the interpretation matrix generation unit generates a risk cause interpretation matrix based on the contribution scores. Based on the risk level category, optimal intervention strategy parameters, and risk cause explanation matrix, the system automatically selects and executes tiered intervention measures, including content identification, risk warning, content blocking, content replacement, manual review, and interpretable report output. The response intensity and triggering conditions are dynamically adjusted according to the optimal intervention strategy parameters, and the final risk level, risk category, risk cause explanation matrix, and intervention measure execution results are output.

5. The method for content risk identification and intervention based on large models generated by deep learning according to claim 1, characterized in that, The actual effect data and user feedback results of the collected risk intervention measures are used as inputs to the Cheetah optimization algorithm for fitness assessment, enabling continuous optimization and adaptive evolution of the multimodal perception model's structural parameters, risk thresholds, and intervention strategy parameters, including: Collect multi-dimensional intervention effect data on content changes resulting from implemented intervention measures, risk re-identification results, and user behavior responses; collect user feedback results, including user ratings, comment sentiment, report frequency, and interaction behavior information. The intervention effect data and user feedback results are standardized and encoded to construct a feedback feature vector; The feedback feature vector is input into the Cheetah optimization algorithm. Combined with the current multimodal perception model structural parameters, risk threshold, and intervention strategy parameters as variables to be optimized, fitness assessment is performed. Based on the fitness assessment results, the numerical combination of the variables to be optimized is updated to form the optimized parameter scheme. Within a preset optimization execution cycle, based on continuously received feedback feature vectors, the structural parameters, risk thresholds, and intervention strategy parameters of the multimodal perception model are periodically updated to achieve adaptive optimization and evolution of the multimodal perception model's performance.