Risk detection method and apparatus
By combining a multi-stage large model architecture with domain knowledge and context enhancement, the computational resource consumption and misjudgment problems of end-to-end risk control of large models are solved, achieving high throughput and high accuracy in content risk identification, thus meeting actual business needs.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
- Filing Date
- 2026-03-04
- Publication Date
- 2026-06-26
AI Technical Summary
Directly applying large models end-to-end for content risk control is difficult to meet the high requirements of actual business scenarios, including high consumption of computing resources, high inference latency, lack of domain adaptability, high false positive rate, difficulty in balancing broad coverage and precise identification, and neglect of the collaborative use of risk judgment rules and contextual information.
It adopts a cascaded processing flow, using a multi-stage large model architecture with lightweight, medium-complexity and moderate complexity, combined with domain knowledge and context enhancement, to dynamically schedule model resources of different complexities, thereby achieving refined and highly accurate identification of content risks.
While ensuring high throughput processing capabilities, it achieves refined and highly accurate identification of content risks, meeting the comprehensive needs of performance, cost and effectiveness in actual business scenarios, significantly reducing system resource consumption and improving identification accuracy.
Smart Images

Figure CN122285877A_ABST
Abstract
Description
Technical Field
[0001] This specification relates to the field of risk control technology, and in particular to a method and apparatus for intelligent risk control using large models, a computer-readable storage medium, and a computing device. Background Technology
[0002] With the rapid development of internet platforms, user-generated content (UGC) has experienced explosive growth, encompassing various modalities such as text, images, audio, and video. While improving the efficiency of information dissemination, it has also brought the risk of spreading a large amount of illegal, vulgar, false, or harmful content. Content risk control, as a core mechanism for ensuring a clean cyberspace, maintaining platform compliance, and protecting user safety, aims to prevent the spread of high-risk content by using automated methods to identify, classify, and handle content in real time. It not only concerns the platform's legal compliance and social responsibility but also directly impacts user experience and the health of the business ecosystem, making it an indispensable key capability in internet infrastructure.
[0003] In recent years, large language models (LLMs) have made groundbreaking progress, demonstrating significant advantages in semantic understanding, contextual reasoning, and multi-task generalization. The industry has begun exploring the application of large models in content risk control scenarios, hoping to leverage their powerful semantic modeling capabilities to improve the accuracy of identifying complex, obscure, and adversarial content violations, overcoming the limitations of traditional keyword matching, rule engines, or shallow machine learning models, such as weak generalization ability and high false positive rates.
[0004] However, directly applying large models end-to-end to content risk control has limitations and cannot meet the higher requirements of actual business scenarios. Summary of the Invention
[0005] This specification describes a risk detection method and apparatus that can solve the above-mentioned technical problems.
[0006] According to the first aspect, a risk detection method is provided. The method includes: inputting target content to be detected into a first large model to obtain a first prediction result, which indicates that the target content is high-risk, low-risk, or has an undetermined risk status. In response to the first prediction result indicating an undetermined risk status, inputting the target content and related risk discrimination rules into a second large model to obtain a second prediction result, which indicates a risk type to be detected or that the target content is low-risk. In response to the second prediction result indicating a risk type to be detected, invoking a third large model corresponding to that risk type to process the target content and related contextual information, generating a target risk detection result.
[0007] In one embodiment, before inputting the target content to be detected into the first large model, the method further includes: selecting the first large model that supports the modality from the risk screening model set according to the modality of the target content.
[0008] In one embodiment, the method further includes: in response to the first prediction result indicating high risk / low risk, allowing the publication / interception of the target content.
[0009] In one embodiment, in response to the first prediction result indicating that the risk status is pending, the target content and related risk discrimination rules are input into the second major model, including: in response to the first prediction result indicating that the risk status for several risk types is pending, retrieving a preset risk discrimination rule library based on the several risk types to obtain the corresponding risk discrimination rules; and inputting the target content and the retrieved risk discrimination rules into the second major model.
[0010] In one embodiment, in response to the first prediction result indicating that the risk status is pending, the target content and related risk discrimination rules are input into the second large model, including: in response to the first prediction result indicating that the risk status is pending, retrieving a preset risk discrimination rule library based on the target content to obtain related risk discrimination rules; and inputting the target content and the retrieved risk discrimination rules into the second large model.
[0011] In one embodiment, the method further includes: in response to the second prediction result indicating low risk, allowing the publication of the target content.
[0012] In one embodiment, the risk type to be detected is multiple risk types; wherein, generating the target risk detection result includes: respectively calling multiple third-level models corresponding to the multiple risk types, processing the target content and related context information, obtaining the risk detection results output by each of the third-level precision judgment models, and forming the target risk detection result.
[0013] In one embodiment, determining the relevant context information includes: retrieving a preset risk discrimination rule base based on the risk type to be detected to obtain the corresponding risk discrimination rule; and / or, retrieving a preset domain knowledge base based on the target content to obtain relevant domain knowledge; and / or, obtaining historical submission content within a predetermined backtracking period before the target content was submitted in the business scenario to which the target content belongs; and / or, obtaining content published by the publisher of the target content in other business scenarios; and / or, obtaining user profile features and historical network behavior sequences of the publisher of the target content.
[0014] In one embodiment, after generating the target risk detection result, the method further includes: constructing a target sample based on the target content and the target risk detection result to determine a historical sample set; using a fourth model to identify misjudged samples based on the historical sample set; performing attribution analysis on the misjudged samples using a fifth model; and processing the analysis results of the attribution analysis using a sixth model to obtain optimization suggestions for risk detection.
[0015] Furthermore, in a specific embodiment, determining the historical sample set includes: incorporating the target sample set into the initial sample set; performing semantic clustering on multiple detected contents involved in the initial sample set to obtain multiple clusters; sampling each of the multiple clusters, and incorporating the samples corresponding to the sampled contents into the historical sample set.
[0016] In another specific embodiment, the determination of the historical sample set includes: classifying the target sample set into the initial sample set; determining several groups of samples from the initial sample set that meet predetermined conditions, wherein the predetermined conditions are: the detected content of multiple samples within a group is semantically similar, but the risk detection results are different; and classifying the several groups of samples into the historical sample set.
[0017] On the other hand, in a specific embodiment, the fifth major model is used to perform attribution analysis on the misjudged samples, including: using the fifth major model to process the misjudged samples and corresponding context information, which includes relevant knowledge retrieved from the knowledge base and background information searched using Internet tools; wherein, the attribution analysis indicates that the cause of misjudgment includes the lack of the background information, and the optimization suggestion includes supplementing the knowledge extracted from the background information into the knowledge base.
[0018] In one embodiment, after generating the target risk detection result, the method further includes: constructing a target sample using the target content and the target risk detection result, and incorporating it into the original sample set; and selecting a high-quality sample set from the original sample set for training the third major model.
[0019] Furthermore, in a specific embodiment, after selecting a high-quality sample set from the original sample set, the method further includes: using the high-quality sample set, training the third model using reinforcement learning. The reward function involved in the reinforcement learning is in the form of a reward score table, where any row i and column j in the table represents the reward score for classifying a sample of risk type i as risk type j.
[0020] In one embodiment, the parameter sizes of the first, second, and third largest models increase progressively.
[0021] According to the second aspect, a risk detection method is provided. The method includes: inputting target content to be detected and the user characteristics of the user who published it into a first model to obtain a first prediction result, which indicates that the publishing user is a high-risk user, a low-risk user, or a risk status pending. In response to the first prediction result indicating a risk status pending, inputting the target content, user characteristics, and related risk discrimination rules into a second model to obtain a second prediction result, which indicates a user risk type to be detected or that the publishing user is a low-risk user. In response to the second prediction result indicating a user risk type to be detected, invoking a third model corresponding to that user risk type to process the target content, user characteristics, and related contextual information to generate a target risk detection result.
[0022] According to a third aspect, a risk detection device is provided. The device includes: a first prediction unit configured to input target content to be detected into a first large-scale model to obtain a first prediction result, which indicates that the target content is high-risk, low-risk, or has an undetermined risk status. A second prediction unit configured to, in response to the first prediction result indicating an undetermined risk status, input the target content and related risk discrimination rules into a second large-scale model to obtain a second prediction result, which indicates a risk type to be detected or that the target content is low-risk. A third prediction unit configured to, in response to the second prediction result indicating a risk type to be detected, invoke a third large-scale model corresponding to that risk type to process the target content and related context information, and generate a target risk detection result.
[0023] According to the fourth aspect, a risk detection device is provided. The device includes: a first prediction unit configured to input target content to be detected and user characteristics of the user who published it into a first model to obtain a first prediction result, indicating that the publishing user is a high-risk user, a low-risk user, or a risk status pending. A second prediction unit configured to, in response to the first prediction result indicating a pending risk status, input the target content, user characteristics, and related risk discrimination rules into a second model to obtain a second prediction result, indicating a user risk type to be detected or that the publishing user is a low-risk user. A third prediction unit configured to, in response to the second prediction result indicating a user risk type to be detected, invoke a third model corresponding to that user risk type to process the target content, user characteristics, and related contextual information to generate a target risk detection result.
[0024] According to a fifth aspect, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method provided in the first or second aspect.
[0025] According to a sixth aspect, a computing device is provided, including a memory and a processor, wherein the memory stores executable code, and the processor, when executing the executable code, implements the method provided in the first or second aspect.
[0026] In summary, the methods and apparatus disclosed in the embodiments of this specification introduce a novel risk control architecture that balances efficiency, accuracy, and scalability. It can combine domain knowledge and context enhancement to dynamically schedule model resources at different levels, ensuring high throughput processing capabilities while achieving refined and highly accurate identification of content risks or subject risks. This better meets the comprehensive needs for performance, cost, and effectiveness in actual business scenarios. Attached Figure Description
[0027] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0028] Figure 1 This is a schematic diagram of the implementation architecture of the risk detection scheme disclosed in the embodiments of this specification;
[0029] Figure 2 This is a schematic diagram of the components of the context engineering module disclosed in the embodiments of this specification;
[0030] Figure 3A schematic diagram of the process steps for the content risk detection method disclosed in the embodiments of this specification;
[0031] Figure 4 This is a schematic diagram illustrating the process steps for performing quality inspection attribution after risk detection for the content disclosed in the embodiments of this specification.
[0032] Figure 5 This is a schematic diagram of the process steps of the subject risk detection method disclosed in the embodiments of this specification;
[0033] Figure 6 This is a functional structure diagram of the risk detection device disclosed in the embodiments of this specification;
[0034] Figure 7 This is a functional structure diagram of the main risk detection device disclosed in the embodiments of this specification. Detailed Implementation
[0035] The solution provided in this specification will now be described with reference to the accompanying drawings.
[0036] As mentioned earlier, directly applying large-scale models end-to-end for content risk control has limitations and cannot meet the higher requirements of real-world business scenarios. Specifically:
[0037] First, large models typically consume significant computational resources and have high inference latency, making it difficult to meet the demands of high-concurrency, low-latency online business. Second, general-purpose large models lack domain adaptability for specific risk types, making them prone to missed or false positives. Third, a single model architecture struggles to balance the dual goals of "broad coverage" and "precise identification"—calling heavy-duty large models for all content leads to uncontrollable system costs; relying solely on lightweight models results in insufficient ability to distinguish boundary cases (such as semantic ambiguity, irony, and metaphor). Furthermore, existing solutions often neglect the synergistic use of risk identification rules and contextual information, resulting in a lack of interpretability and business alignment in model decisions.
[0038] Based on the above observations and analysis, the applicant proposes a novel content risk control architecture that balances efficiency, accuracy, and scalability. This architecture combines domain knowledge and contextual enhancement to dynamically schedule model resources of varying complexity. While ensuring high throughput, it achieves refined and highly accurate identification of content risks, thereby better meeting the comprehensive needs for performance, cost, and effectiveness in real-world business scenarios.
[0039] Figure 1This paper illustrates a novel multi-stage collaborative content risk control architecture for efficient and accurate risk identification of user-submitted target content (such as text, images, or multimedia information). The architecture employs a cascaded processing flow, sequentially including an initial risk screening stage, a task allocation stage, and a refined judgment stage. The models used in each stage progressively increase in structural complexity and computational resource consumption, thereby ensuring the overall system throughput while achieving high-precision identification of high-risk content.
[0040] Specifically, in the initial risk screening stage, the target content is first input into a lightweight initial screening model (such as a large model after distillation). This model quickly outputs preliminary risk assessment results, classifying the content into three categories: "high risk," "low risk," or "risk status pending." Content clearly determined to be low risk is directly approved; high-risk content may trigger immediate interception or manual review; and content that is in a fuzzy boundary and whose risk status is difficult to determine (i.e., "risk status pending") proceeds to the next stage.
[0041] During the task routing phase, the target content with an uncertain risk status, along with its associated preset risk assessment rules, is input into a medium-complexity task routing model. This model identifies the most likely risk type or further confirms it as low-risk. If confirmed as low-risk, the task is directly allowed to proceed; otherwise, if the output indicates one or more specific risk types, the task is routed to the corresponding dedicated analysis channel.
[0042] In the refined judgment stage, the system calls a dedicated refined judgment model based on the risk type, and combines the target content and its enhanced contextual information (such as publisher profile, historical behavior sequence, domain knowledge base retrieval results, etc.) to perform deep semantic analysis and contextual reasoning, and finally generate a high-confidence target risk detection result.
[0043] It is worth noting that the number of model parameters, network depth, or inference computation load of the risk screening model, task diversion model, and dedicated precision judgment model increase sequentially, forming a three-tiered model hierarchy of "light-medium-heavy". This design effectively balances processing efficiency and recognition accuracy: the vast majority of low-risk content is quickly filtered out in the initial screening stage, with only a small number of difficult samples entering the high-cost precision judgment stage, significantly reducing the overall system resource consumption, while ensuring that the detection rate and accuracy of key risk content meet actual business needs.
[0044] Next, we will introduce the context engineering module in the above-mentioned new content risk control architecture, which is used to provide accurate and comprehensive context information required for risk control on demand.
[0045] The context engineering module can dynamically construct an enhanced context containing multi-source heterogeneous information for each inference step, thereby improving the risk understanding and decision-making capabilities of models at each stage (especially the dedicated precision judgment model in the fine-grained judgment stage) regarding the current content. For example... Figure 2 As shown, the core components of the context engineering module include four categories of elements: instructions, knowledge, memory, and tools.
[0046] 1. Instructions are judgment rules dynamically retrieved and injected from a pre-built rule base. They are used to guide the large model to make risk assessments based on business compliance requirements, covering:
[0047] (1) Bottom line risk instructions are used to identify content that violates laws and regulations.
[0048] (2) Public order and good morals directives are used to identify content that violates social morality and civilized norms, including vulgarity and insults.
[0049] (3) Business experience instructions are used to identify behaviors that affect user experience or damage the community ecosystem, such as promotion and traffic generation, or behaviors that cause discomfort.
[0050] (4) Personalized instructions, regulatory compliance requirements customized for specific vertical business scenarios, such as identifying illegal securities consultation, stock recommendation and fund recommendation behaviors in financial content communities;
[0051] (5) Quality understanding instructions, which are quality assessment rules for content distribution and recommendation scenarios, such as the aesthetics and professionalism of the content.
[0052] (6) Subjective cognitive instructions, focusing on the rules for identifying user behavioral intentions, such as inducing bad behavior.
[0053] The total number of these instructions can reach several hundred.
[0054] The aforementioned instructions are generally not statically embedded, but rather retrieved from the rule base based on the semantic features of the current input content using the "instruction retrieval" function in the tool module. The instructions are then dynamically injected into the context to ensure that the rules are highly aligned with the task.
[0055] 2. Knowledge is structured or semantic domain information used to support the model's accurate identification of technical terms, sensitive entities, etc., effectively avoiding misjudgments or omissions that can easily occur when relying solely on the general knowledge of large models. Knowledge can be organized in the following two forms:
[0056] (1) Key-value knowledge, with an indexable identifier as the key and the corresponding knowledge description as the value. When the input content matches a key, the system automatically recalls its value.
[0057] (2) Semantic fragment (Chunk) knowledge: unstructured long texts from sources such as forums and encyclopedias are segmented into semantically coherent knowledge fragments of appropriate length, and semantic similarity retrieval is achieved through vector embedding.
[0058] The retrieval of the aforementioned knowledge can rely on the "knowledge retrieval" function in the tool module. Before reasoning, relevant key-value items or chunk fragments can be retrieved from the knowledge base in real time based on the semantic representation of the target content and injected into the model as part of the context, significantly enhancing its domain reasoning ability.
[0059] 3. Memory provides personalized and contextualized analytical background, resolving the ambiguity in judgment caused by the lack of context when analyzing single content fragments. Memory information mainly includes:
[0060] (1) Contextual information, including context within the same scenario (such as the main post to which the current comment belongs or the previous level comment) and context across scenarios (such as the historical content published by the user in other business scenarios).
[0061] (2) Scene background information, used to identify the business type to which the current content belongs (such as financial community, game community, general social scene, etc.), and different scenarios correspond to different risk focus.
[0062] (3) User profile, which covers basic user attributes (such as registration duration and real-name authentication status) and risk attributes (such as the number of historical violations and the frequency of being reported).
[0063] (4) Behavior sequence, which records the key temporal behavior trajectory of users, such as modifying information, adding friends, and adding notes when sending red envelopes / transfers.
[0064] The acquisition of the aforementioned memory data can rely on the "Content Reverse Lookup" function in the tool module. This function can query the user's historical behavior logs, related content streams, and scene metadata in real time based on the current content request, and construct a complete personalized contextual information.
[0065] 4. As the execution engine of the context engineering module, the tool not only supports the dynamic acquisition of instructions, knowledge, and memory, but also provides other key reasoning enhancement capabilities. Specifically, these include:
[0066] (1) Internet search: used to retrieve current hot events or emerging entities, making up for the limitations of knowledge cutoff in large models.
[0067] (2) Instruction and knowledge retrieval: As mentioned above, based on the semantics of the input content, Top-K related items are retrieved from the rule base and knowledge base.
[0068] (3) Content reverse lookup: used to extract historical content and user behavior in the same or cross-scene to build memory context.
[0069] (4) Content preprocessing: Standardize and clean the original input, including filtering HTML tags, irrelevant links, emojis and rich text noise, to ensure that the subsequent model focuses on effective semantic content.
[0070] As described above, the context engineering module selectively and organically integrates instructions, knowledge, and memory through a tool-driven dynamic construction mechanism, generating highly adapted enhanced contexts for discrimination tasks of different stages and risk types. This not only improves the robustness of the model in risk identification in complex, adversarial, or ambiguous scenarios, but also ensures that the entire multi-stage risk control architecture maintains high throughput efficiency while possessing refined, interpretable, and iterative intelligent discrimination capabilities.
[0071] The above describes the context engineering module in the new content risk control architecture. Next, the methodological steps for risk detection based on this architecture will be introduced. It should be noted that the executing entity of this method can include: servers, server clusters, cloud computing platforms, edge computing devices, or distributed processing systems composed of multiple computing nodes, and other electronic devices or devices with data processing and model inference capabilities. In specific application scenarios, the executing entity can be a content security platform, an intelligent risk control platform, or an integrated system such as an automated review system deployed in a data center. Figure 3 A schematic risk detection method may include the following steps:
[0072] Step S310: Input the target content to be detected into the first large model to obtain a first prediction result, which indicates that the target content is high-risk, low-risk, or has an undetermined risk status. Step S320: In response to the first prediction result indicating an undetermined risk status, input the target content and related risk discrimination rules into the second large model to obtain a second prediction result, which indicates a risk type to be detected or that the target content is low-risk. Step S330: In response to the second prediction result indicating a risk type to be detected, call the third large model corresponding to that risk type to process the target content and related contextual information, generating a target risk detection result.
[0073] It should be noted that the terms "first" and "second" in phrases such as "first major model" and "second prediction result," as well as similar terms elsewhere in the text, are used to distinguish similar items and do not serve any other limiting function such as ranking or indicating importance. The steps described above are explained in detail below:
[0074] First, in step S310, the target content to be detected is input into the first large model to obtain the first prediction result, which indicates that the target content is high-risk, low-risk, or the risk status is undetermined.
[0075] The aforementioned target content is typically submitted or published by users on internet platforms. Typical scenarios include forum posts, comment replies, private messages, social media updates, live stream comments, product reviews, and Q&A content. From a content modality perspective, target content can be plain text, static images, audio / video clips, or multimodal content combining text and images / audio / video. For example, a promotional message containing persuasive text and a QR code image, or a short video containing sensitive audio, both fall under the scope of target content processed by the system.
[0076] Accordingly, the first large model can dynamically select suitable model instances from a preset risk screening model set based on the modal characteristics of the target content, or a unified multimodal screening model can be used. In one embodiment, if the target content is plain text (such as comments or posts), the Large Language Model (LLM) is called as the first large model, such as a self-developed text understanding model based on the Transformer architecture; if the target content is an image or video, a multimodal large model is called, such as a Contrastive Language–ImagePretraining (CLIP), a Bootstrapped Language–ImagePretraining (BLIP), or a self-developed vision-language joint model; if it is audio content, it can be converted into text first by combining the Automatic Speech Recognition (ASR) module, and then processed by the large language model, or an end-to-end audio-text multimodal model can be used directly. In another embodiment, the system can deploy a unified multimodal initial screening model that can simultaneously process multiple inputs such as text, images, and audio to achieve modality-independent preliminary risk assessment.
[0077] The first large model can be a domain-fine-tuned large model whose training data covers a large number of labeled risk and compliance samples to ensure good generalization ability in content security tasks. Specific implementation methods include: (1) Instruction Tuning or Supervised Fine-Tuning (SFT) on an internal risk control dataset based on an open-source large model; (2) A fully self-developed large-scale pre-trained and fine-tuned model; (3) A lightweight student model obtained by compressing a heavy teacher model using knowledge distillation technology to balance inference efficiency and accuracy. Regardless of the source, the model is optimized for quickly outputting coarse-grained risk judgments. For example, the number of parameters in the first large model can be in the hundreds of millions.
[0078] It is worth noting that when inputting the first major model, in addition to the target content itself, a first task description (or "risk screening task description") is also injected. This task description is provided in natural language or structured prompts, clearly instructing the model to perform the task objective, such as: "Please assess whether the following content contains illegal, irregular, vulgar, or other high-risk behaviors, and output 'high risk,' 'low risk,' or 'risk status pending.'" Furthermore, the task description can be further refined to require the model to make preliminary inferences about the possible risk categories when the risk status is "risk status pending," such as: "If it cannot be determined whether it is illegal, but it is suspected to involve certain risk types, please indicate the suspected risk types." Under this setting, the output of the first major model not only includes three risk statuses, but can also attach suspected risk labels under the risk status pending, providing preliminary clues for subsequent task triage.
[0079] Based on the first prediction result, differentiated treatment will be implemented (see [reference]). Figure 1 ):
[0080] Branch 1: If the result is "low risk", then allow the target content to be published normally or enter the content distribution process, or assume that the target content has already been published and do not delete it.
[0081] Branch 2: If the result is "high risk", the interception mechanism will be triggered immediately to prevent the content from being published. An alarm log can be generated or pushed to the manual review queue at the same time. Alternatively, the target content can be deleted as if it has already been published.
[0082] Branch 3: If the result is "risk status pending", then retain this content and do not make a final decision. Instead, pass it along with the relevant first context information (such as suspected risk category, original modal data, etc.) to the next stage, namely the task diversion stage.
[0083] Through the above mechanism, the vast majority of content that is clearly compliant or obviously non-compliant is processed in the initial screening stage, which significantly reduces the number of samples that need to enter the high-cost fine-grained judgment stage, thereby effectively reducing the overall computing load of the system and improving the throughput and response efficiency of the end-to-end risk control pipeline.
[0084] Step S320: In response to the first prediction result indicating that the risk status is pending, the target content and related risk discrimination rules are input into the second model to obtain a second prediction result, which indicates the type of risk to be detected or that the target content is low risk.
[0085] The aforementioned risk discrimination rules can be dynamically obtained through the instruction retrieval tool in the context engineering module. The retrieval criteria may include: (1) the semantic features of the target content itself (such as keywords, themes, sentiment tendencies, etc.); and (2) the suspected risk category information attached to the first prediction result. Thus, the Top-K (such as the first 10) most relevant risk discrimination rules are accurately retrieved from the pre-set instruction library and injected into the subsequent reasoning process as structured task constraints.
[0086] In addition to risk assessment rules, the system can also input retrieved domain knowledge into the second major model. This knowledge is also acquired in real-time by the knowledge retrieval tool in the context engineering module, primarily based on the semantic representation of the target content. For example, when the target content contains expressions such as "guaranteed profit" or "insider information," the system can recall key-value (KV) knowledge (such as regulatory definitions and typical terminology) or chunk knowledge (such as fragments from a financial compliance encyclopedia) related to "illegal securities consulting" to enhance the model's understanding of professional risks. Knowledge and rules together constitute a semantically enhanced context for the target content, improving the accuracy of traffic diversion.
[0087] Simultaneously, the second major input to the model also includes a second task description (also known as a "triage task description"), which explicitly indicates the model's output objective in the form of natural language prompts or structured instructions. For example: "Based on the following content and the provided risk rules, determine whether there are any risk types that require further refinement; if so, list the specific risk types; if there are no obvious risks, output 'low risk'." This task description ensures that the model focuses on risk type identification and routing decisions, rather than directly outputting the final risk assessment.
[0088] Regarding the origin and characteristics of the second major model: it can adopt a similar technical approach to the first major model, such as fine-tuning an open-source large model, developing a self-developed large model, or a model optimized by SFT. For specific implementation details, please refer to the relevant description of the first major model. However, unlike the lightweight, high-throughput-oriented initial screening model, the second major model has improved architectural complexity, parameter scale, or inference depth, possessing stronger multi-rule fusion understanding and fine-grained classification capabilities to support the differentiation and identification of various potential risk types. Its modality adaptation method is also consistent with the first major model—the corresponding LLM, CLIP, BLIP, or multimodal joint model can be selected according to the type of target content (text, image, audio / video, etc.).
[0089] Based on the second prediction result, the following actions can be taken (see [reference]). Figure 1 ):
[0090] Branch A: If the output is "low risk", then terminate the process and allow the target content to be published normally.
[0091] Branch B: If the output is one or more "risk types to be detected", the task will be routed to the corresponding refined discrimination stage to call a dedicated refined discrimination model for in-depth analysis.
[0092] The core function of this stage (task routing stage) is to accurately identify potential risk types and route tasks when the initial screening cannot determine the risk status. This is achieved by introducing a moderately complex model with rule and knowledge enhancement. On the one hand, this avoids blindly sending all "pending" samples into the high-cost fine-tuning stage; on the other hand, it prevents missed judgments due to insufficient rule coverage. This achieves a crucial balance between efficiency and accuracy, significantly improving the resource utilization efficiency and risk coverage capability of the entire risk control pipeline.
[0093] Step S330: In response to the second prediction result indicating the type of risk to be detected, the third major model corresponding to the risk type is invoked to process the target content and related context information to generate a target risk detection result.
[0094] Regarding the composition and acquisition methods of the relevant context information used in this step: It can be understood that the relevant context information is a key enhancement input supporting the high-precision discrimination of the third major model. Its determination process includes one or more of the following methods (which can be used individually or in combination):
[0095] 1) Based on the type of risk to be detected, retrieve the corresponding instruction set from the preset risk discrimination rule base. For example, if the risk type is "illegal securities consultation", then recall personalized instructions and bottom-line risk instructions related to financial compliance.
[0096] 2) Based on the target content, retrieve relevant structured or unstructured knowledge from the preset domain knowledge base, including KV knowledge (such as "guaranteed profit" → "typical illegal stock recommendation rhetoric") or Chunk knowledge (such as fragments of financial regulatory policies).
[0097] 3) Obtain the historical submissions within a predetermined backtracking period (such as the most recent 24 hours or the most recent 5 entries) in the business scenario to which the target content belongs, in order to understand the local context of the current content.
[0098] 4) Obtain the content published by the publisher of the target content in other business scenarios (such as a user posting in a financial community and then commenting on related content on a social platform) to identify cross-scenario collaborative violations.
[0099] 5) Obtain the user profile characteristics and historical network behavior sequence of the publisher of the target content, including basic attributes (such as real-name status and registration duration), risk attributes (such as the number of historical violations), and key behavioral trajectories (such as frequent modification of information, batch addition of friends, and transfer remarks containing sensitive words).
[0100] All of the above information can be constructed in real time before reasoning by the tool components in the context engineering module (such as content reverse lookup, instruction / knowledge retrieval, etc.), forming a highly contextualized and personalized enhanced context.
[0101] To guide the third major model to focus on high-precision, interpretable risk assessment, a third task description (also known as a "refined judgment task description") can be injected during inference. In one design, this task description can be provided in the form of prompts, clearly defining the analytical framework, judgment principles, and output format to ensure that the model output is consistent, compliant, and aligned with business requirements.
[0102] The third task description can use the following prompts (for example only):
[0103] #Task Description
[0104] You are an experienced Chinese internet content moderation expert. Your task is to conduct a rigorous, step-by-step analysis based on all the content I provide, including "risk rules," "input content," "supplementary knowledge," "user information," and "background information," and ultimately provide a judgment conclusion that strictly adheres to the specified format.
[0105] #Core Mission
[0106] Determine whether the "input content" triggers the "risk rule".
[0107] #Analysis and Thinking Framework
[0108] 1. [Content Analysis]: First, please conduct a two-stage analysis:
[0109] (1) Risk signal scan (highest priority): [omitted here]
[0110] (2) Contextualized Interpretation (Default Path): [Omitted Here]
[0111] Based on this, we identify entities, behaviors, opinions, and emotional tendencies in the content and analyze their ultimate intentions in specific scenarios.
[0112] 2. [Content and Rule Matching Analysis]: First, [details omitted]. Second, directly compare the results of the "Content Analysis" with the "Risk Rules." Discuss which parts of the content directly or indirectly triggered the rules. If the content did not trigger the rules, please clearly explain why.
[0113] 3. Supplementary knowledge relevance and application analysis:
[0114] (1) Rigorous assessment of relevance: [omitted here]
[0115] (2) Carefully assess the application value: [omitted here]
[0116] 4. [Comprehensive Assessment]: Based on all the above analyses, a final weighing and summary is made, clearly explaining the logical chain that led to the final conclusion.
[0117] #Judgment Principles
[0118] Regarding "Rules and Regulations": [omitted here]
[0119] Regarding "uncertainty": [omitted here]
[0120] Regarding "Supplementary Knowledge": [omitted here]
[0121] Regarding the "risk signal priority principle": [omitted here]
[0122] Regarding the "principle of comprehensively judging intent and context": [omitted here]
[0123] Regarding the "principle of rule generalization (applying knowledge to other situations)": [omitted here]
[0124] Regarding the "most probable explanation principle" (default principle): [omitted here]
[0125] #Output format requirements
[0126] whether
[0127] [Explanation of Attribution]: xxx
[0128] #rule
[0129] {rules}
[0130] #Supplementary Knowledge
[0131] {knowledge}
[0132] #Background Information
[0133] {background}
[0134] #User Information
[0135] {account_info}
[0136] #Enter content
[0137] Current content being detected: {content}
[0138] Content context: {context}
[0139] In the above prompt example, placeholders such as {rules}, {knowledge}, {background}, {account_info}, {content}, and {context} are dynamically populated at runtime by the context engineering module with the actual retrieved risk discrimination rules, domain knowledge, business scenario background, user profile, target content, and their contextual information. This task description, through a strongly constrained reasoning framework, significantly improves the model's robustness and interpretability in complex, ambiguous, or adversarial scenarios.
[0140] The system can determine the third major model to be invoked in any of the following ways:
[0141] 1) Static mapping method: A fixed mapping relationship is pre-established between risk types and dedicated precision judgment models. For example, "illegal stock recommendations" corresponds to "financial compliance dedicated precision judgment model". This mapping relationship can be configured in the rule table or model routing table.
[0142] 2) Dynamic Scheduling Method: A large scheduling model (or model selector) is introduced. Its inputs include the modality of the target content (text / image / audio / video), the type of risk to be detected, and business scenario identifiers, etc., and the output is the identifier (ID or path) of the third large model. This scheduling mechanism supports flexible expansion to new risk types and models, improving system maintainability.
[0143] The third model significantly surpasses the first (risk screening) and second (task splitting) models in parameter size, network depth, training data quality, and inference complexity. It is a medium-sized, specialized model designed for specific risk types, possessing enhanced semantic understanding, adversarial example identification, and contextual reasoning capabilities. Its technical implementation (e.g., whether it's based on fine-tuning an open-source large model, whether it employs SFT, etc.) can be found in the description of the first model and will not be repeated here.
[0144] When the second prediction result indicates multiple risk types to be detected, the system will call multiple third-level models corresponding to each risk type to process the same target content and its related context information in parallel, obtaining independent risk detection results output by each dedicated precision judgment model. Finally, these results are aggregated to form a complete target risk detection result. This mechanism ensures that multidimensional risks are not overlooked and supports the comprehensive identification of complex violations.
[0145] The target risk detection results may include risk type, risk level (e.g., high / medium / low), confidence score, location of the violation segment, and a summary of the judgment criteria. Based on this result, the system can perform differentiated actions, such as: if determined to be "high risk," immediately blocking the content, freezing the account, or triggering manual review; if determined to be "medium risk," limiting the display, tagging and monitoring, or pushing to the review queue; if all dedicated precision judgment models output "low risk," then the content is allowed to be published normally.
[0146] The core function of this stage (refined judgment stage) is to: based on the clear risk direction identified through task allocation, utilize highly complex and specialized large-scale models, combined with multi-dimensional contextual information and structured task descriptions, to achieve high-precision, high-confidence, and interpretable identification of difficult, hidden, or adversarial risk content. Although this stage has a high computational cost, it only applies to high-value candidates in a small number of "pending" samples, thereby maximizing the detection rate and accuracy of key risks while ensuring the overall system throughput efficiency, constituting the "precision strike" link in the entire risk control pipeline.
[0147] In summary, risk detection is based on a novel content risk control architecture that utilizes multi-stage collaboration.
[0148] First, a three-tiered cascaded design of "initial screening—triage—refined judgment" achieves an optimal balance between computational resources and recognition accuracy. The lightweight initial screening model quickly filters out a large amount of clearly compliant or obviously non-compliant content, significantly reducing system load; the moderately complex triage model accurately identifies potential risk directions, avoiding blindly calling heavyweight models; and the highly specialized refined judgment model operates only on a small number of high-value and difficult samples, ensuring a high detection rate and low false positive rate for concealed and adversarial risks.
[0149] Secondly, by dynamically constructing enhanced contexts containing instructions, knowledge, memories, and tools through the context engineering module, and combining this with structured task descriptions to guide model reasoning, the interpretability, business alignment, and domain adaptability of model decisions are significantly improved. Especially in complex contexts (such as irony, metaphor, and multimodal content), the system maintains stable and reliable discrimination performance.
[0150] Furthermore, the entire risk control pipeline supports modal adaptation and dynamic expansion of risk types. Whether it is text, images, audio and video, or newly added vertical domain risks (such as the abuse of AI-generated content), they can be quickly adapted by updating the rule base, knowledge base, and dedicated models, which has good maintainability and foresight.
[0151] Finally, as the model parameter scale increases progressively at each stage, the system achieves a leap from "broad coverage" to "deep mining" while ensuring high throughput and low latency. This effectively solves the technical bottleneck of traditional single-model solutions that struggle to balance efficiency and accuracy, providing a high-performance, high-accuracy, and highly interpretable intelligent risk control solution for large-scale internet platforms.
[0152] The aforementioned novel content risk control architecture not only demonstrates superior performance in real-time risk detection but also lays a solid foundation for the continuous iteration and self-evolution of the risk control system. On the one hand, it can perform quality inspection attribution based on historical decision samples, user feedback, and other data to pinpoint the causes of misjudgments or omissions and generate targeted optimization suggestions. On the other hand, it can utilize high-confidence detection results to construct high-quality samples for model retraining. Both can be implemented independently or combined collaboratively to drive closed-loop iteration and continuous improvement of risk control capabilities.
[0153] Next, we will first introduce the quality inspection attribution mechanism, and then introduce the model retraining.
[0154] Figure 4 The quality control attribution process may include the following steps:
[0155] Step S410: Construct target samples based on the target content and target risk detection results to determine the historical sample set; Step S420: Based on the historical sample set, use the fourth major model to identify misjudged samples; Step S430: Use the fifth major model to perform attribution analysis on the misjudged samples; Step S440: Use the sixth major model to process the analysis results of the attribution analysis to obtain optimization suggestions for risk detection.
[0156] The steps above are explained in detail below:
[0157] Step S410: Construct target samples based on the above target content and target risk detection results to determine the historical sample set.
[0158] In one implementation, the target samples can be first assigned to an initial sample set, and then downsampling can be achieved through semantic clustering to obtain a historical sample set.
[0159] Specifically, the detected content (such as original input text, image descriptions, or audio / video transcribed text, without contextual information) of each sample in the initial sample set can be semantically vectorized (e.g., by generating dense vectors using a large language model or a dedicated embedding model), and clustered based on vector similarity (e.g., K-means, DBSCAN, etc.) to obtain multiple semantic clusters. Then, each cluster is sampled separately (e.g., randomly selected or stratified sampling based on confidence level), and the complete samples corresponding to the sampled content are assigned to the historical sample set.
[0160] This implementation method can significantly reduce the data size while retaining typical samples in various semantic scenarios, effectively achieving deduplication and downsampling, and improving the efficiency of subsequent analysis.
[0161] In another implementation, the target samples can be first assigned to an initial sample set, and then boundary cases of decision inconsistency can be extracted from them and assigned to a historical sample set.
[0162] Specifically, several groups of samples that meet predetermined conditions can be identified from the initial sample set. These conditions include: the detected content of multiple samples within a group is highly similar semantically (e.g., vector cosine similarity exceeds a threshold). Since these samples are essentially identical or extremely similar, but the risk detection results output by the system differ (e.g., one is judged as "high risk," and another as "low risk"), this indicates that they are at the model's decision boundary and belong to high-value, difficult, or ambiguous cases. Including such sample groups as a whole in the historical sample set helps to focus on analyzing the root causes of model instability.
[0163] The two implementation methods described above can be used independently—the former focuses on efficiently covering mainstream scenarios, while the latter focuses on accurately capturing boundary cases; they can also be used in combination. First, a representative sample set is obtained through clustering downsampling, and then sample groups with inconsistent decisions are selected from it, thus taking into account both breadth and depth, and providing a high-quality, high-information-density historical sample foundation for subsequent misjudgment identification and attribution analysis.
[0164] From the above, we can obtain the historical sample set.
[0165] Step S420: Based on the historical sample set, use the fourth model to identify misjudged samples.
[0166] It should be understood that the risk detection results involved in the historical sample set can be classified into the following four categories: (1) Correct pass: the content is low risk and is judged as low risk; (2) Correct delete: the content is high risk and is judged as high risk; (3) Incorrect pass (missed judgment): the content is actually high risk, but is judged as low risk; (4) Incorrect delete (misjudgment): the content is actually low risk, but is judged as high risk.
[0167] For the sake of brevity, the samples of type (3) and type (4) mentioned above can be collectively referred to as misjudged samples, which are the core objects of quality inspection attribution and system optimization. To efficiently identify misjudged samples, the following implementation methods can be adopted:
[0168] The downsampled historical sample set (e.g., containing 100 samples) and its corresponding risk detection results are batch-input into the fourth model, which performs coarse-grained screening. The fourth model can autonomously determine whether to invoke contextual information based on the preset task description. For example, the task description might instruct the model to "first assess whether there are obvious decision-making contradictions or high-risk signals in the current sample; if so, selectively invoke contextual tools (such as knowledge retrieval and user profile queries) to assist in the judgment." The model outputs a list of misjudged sample indices (e.g., "Samples 1, 3, 5, and 7 require further analysis"), thus significantly reducing the size of the sample to be analyzed (e.g., from 100 to 15).
[0169] The above process enables efficient screening of high-value misjudged samples, laying the foundation for further in-depth analysis of the root causes of their errors.
[0170] Step S430: Attribution analysis is performed on the misjudged samples using the fifth major model.
[0171] Specifically, for the key misjudged samples selected by the fourth model, the fifth model (whose parameter scale and inference ability are superior to the fourth model) is invoked for in-depth analysis of each sample. This stage forcibly injects complete contextual information, including: the business scenario to which the target content belongs, the publisher's user profile, historical behavior sequences, same / cross-scenario context, and real-time background information obtained through online search (for example, for statements like "This game is really fun," the system considers whether the game experienced a serious server failure or a large-scale player complaint on that day to determine whether it is a sincere recommendation or a sarcastic complaint).
[0172] Building upon this, the fifth model not only verifies the true risk label of the sample (i.e., confirms whether it was mistakenly passed or mistakenly deleted), but more importantly, outputs the specific reasons that led to the misjudgment by the preceding model. For example, attribution may include:
[0173] 1) Lack of knowledge in the relevant field (e.g., lack of understanding of the background of the event).
[0174] 2) Missing contextual information (e.g., failure to obtain user history or the context of the content).
[0175] 3) The risk assessment rules are incomplete or the weights are unreasonable.
[0176] 4) The model lacks understanding of complex linguistic phenomena such as irony and metaphor.
[0177] The above review and attribution results for misjudged samples provide a direct basis for generating subsequent system optimization suggestions.
[0178] Step S440: Process the analysis results of the attribution analysis using the sixth major model to obtain optimization suggestions for risk detection.
[0179] It should be noted that the sixth model can be a very large-scale language model, with a higher number of parameters than the models mentioned above. Its input includes not only the attribution conclusions output by the fifth model, but also a comprehensive set of information from the target sample—covering the original detected content, complete contextual information (such as user profiles, historical behavior, cross-scene content, and real-time network background), detection results from previous stages, and the rules and knowledge fragments used. Based on this rich input, the sixth model can identify common defects at the system level and generate highly feasible optimization suggestions.
[0180] In some embodiments, the optimization suggestions output by the sixth model may include:
[0181] 1) Knowledge base supplementation: If the attribution indicates that the misjudgment is due to a lack of specific background information, it is recommended to extract the relevant background information retrieved from Internet tools into structured knowledge and supplement it into the preset domain knowledge base.
[0182] 2) Optimization of risk identification rules: If the attribution indicates that the existing rules fail to cover new violation patterns, it is recommended to add or adjust the identification rules for the corresponding risk types.
[0183] 3) Model inference path optimization: If the attribution points to a bias in the thought chain reasoning, it is recommended to optimize the guiding logic in the task description or the prompt template in the training data.
[0184] Based on an example, when the input is "Sample 1 was mistakenly deleted and Sample 2 was missed" along with its complete context and attribution, the sixth model can output: for satirical content related to games, add keywords of major operational incidents in the past 7 days to the knowledge base, and enhance the priority judgment logic for the combination of "superficial positive + negative event" in the task description.
[0185] In summary, the quality inspection attribution mechanism achieves an automated closed loop from sample selection and root cause identification to strategy generation through the collaborative work of multiple large models. It is worth noting that the fourth, fifth, and sixth large models used in this mechanism are typically designed with larger overall parameter scales and inference capabilities than the large models used in the risk detection process. This ensures that the attribution analysis has higher judgment authority and depth of insight, effectively supporting the continuous evolution of the risk control system.
[0186] The above introduces the quality inspection attribution mechanism; the next step is to introduce the model retraining.
[0187] After generating the aforementioned target risk detection results, or after accumulating a sufficient amount of data in online risk control operations, the model retraining process can be initiated to achieve continuous evolution of risk control capabilities. This process can rely on a fully automated iterative optimization system covering data feedback, processing, management, training, evaluation, and deployment, forming a closed loop from decision-making to feedback and then to model upgrades.
[0188] (1) Construction and screening of high-quality training samples, which includes multi-source data backflow and automated cleaning.
[0189] Specifically, target samples can be constructed using the aforementioned target content and their corresponding target risk detection results, and then incorporated into the original sample set. This original sample set can not only include automatic detection results, but also integrate multi-source feedback data, including: manual quality inspection annotations, user reports / feedback, and reflective conclusions output by the upstream quality inspection attribution module, etc. These data together form a structured pool of original feedback data.
[0190] Since the raw data may contain noise, label skew, or low confidence issues, the system can execute an automated data processing workflow: through rule-based or label-based balanced sampling, invalid sample filtering, and combined with single-model multi-round inference voting or multi-model parallel evaluation mechanisms, high-reliability samples are selected based on preset confidence thresholds or multi-model consistency requirements. The resulting high-quality sample set serves as the core data source for subsequent training.
[0191] (2) Generation of structured training data, including converting the selected samples into a standardized training format.
[0192] Based on high-quality samples, the system generates structured training data suitable for supervised fine-tuning (SFT). The typical format is a triple of "content – risk rule – hit & explanation attribution," where the explanation attribution can originate from the analysis results of the quality control attribution phase or the model's self-explanatory output. This data is standardized and proportionally divided into training and evaluation sets, forming mature data suitable for model training.
[0193] (3) Model training strategy
[0194] Currently, the SFT method is mainly used to train the large models used in the aforementioned risk detection and quality inspection attribution processes. Considering that most open-source large models do not natively possess the ability to identify risks in specific domains, the routing layer and analysis decision layer models must be trained specifically before deployment.
[0195] The training platform can use mainstream open-source large models or safe and compliant self-developed large models, and supports a variety of efficient fine-tuning strategies such as full fine-tuning (FFT).
[0196] (4) Multi-dimensional model evaluation and deployment to ensure the safety and effectiveness of iterative models.
[0197] After training, the new model must undergo multi-dimensional evaluation before it can be deployed: evaluate core metrics (such as recall rates for various risks and white sample disturbance rates) on industry-standard or authoritative third-party benchmarks (such as benchmarks); conduct dual-run comparisons on historical real business traffic to verify the consistency of decisions with the old model; and evaluate the interception effect and user experience impact of the new model in real-world scenarios through real-time A / B experiments.
[0198] Through the above mechanism, the system not only achieves continuous improvement in model capabilities, but also ensures that each iteration has measurable, explainable, and rollback-proof security guarantees.
[0199] According to another embodiment, for the above "(3) model training strategy", the applicant proposes that a reinforcement learning (RL) mechanism can be introduced to align the different requirements of risk recall rate and risk-free content disturbance rate for different business scenarios.
[0200] It should be noted that RL is primarily applied to a set of specialized precision judgment models within the risk detection process. The reward function involved in RL takes the form of a reward score table, where any value in the i-th row and j-th column represents the reward score for classifying a sample of the i-th risk type as the j-th risk type. For example, the reward score table can be Table 1 below, where Ri represents the specific risk type.
[0201] Table 1: Reward Points Table
[0202]
[0203] The element in the first row and fourth column of Table 1 is -20, which indicates that when a sample with the true label of R1 risk is incorrectly classified as "no risk" (i.e., missed), the corresponding reward score is -20. This high negative value indicates that the system imposes a severe penalty on missed classifications of R1 risk, reflecting an extremely low tolerance for this type of risk.
[0204] Furthermore, the following is an example of a RL learning process:
[0205] For each sample Use the current strategy (Corresponding to the current version of the third major model mentioned above) Make independent predictions n times, and calculate the samples. Average reward: (1)
[0206] Then calculate the relative advantage for each prediction: (2)
[0207] The objective function to be optimized is: (3)
[0208] The superscript GRPO stands for Generalized Reinforcement Policy Optimization, and it can be translated as generalized reinforcement policy optimization. (4)
[0209] For the meaning of each mathematical symbol in the above formulas (1)-(4), please refer to Table 2 below:
[0210] Table 2
[0211]
[0212] Therefore, we can introduce a reinforcement learning (RL) mechanism to enable the model to explicitly weigh the costs of missed and false positives during training, which is more in line with actual business objectives.
[0213] The above describes risk detection, quality inspection attribution, and model retraining for the target content itself in content risk control scenarios. In practice, content risk control sometimes also requires risk detection of the entity publishing the target content (usually a user). In this case, the aforementioned novel multi-stage collaborative risk control architecture can be migrated and adapted to the entity risk identification task, maintaining the same core idea but with corresponding adjustments to the input information and the judgment target.
[0214] Specifically, the main differences lie in the following: In content risk detection, the system focuses on determining "whether the content violates regulations," with input primarily consisting of content modal data, supplemented by contextual enhancement. However, in risk detection of the publishing entity, the system aims to determine "whether the user exhibits high-risk behavioral tendencies," requiring input to simultaneously include user characteristics of both the target content and its publisher (such as registration information, historical behavior sequences, violation records, device fingerprints, etc.), and elevating the risk assessment granularity from "content" to "entity." The remaining architectural design—including the three-level cascaded process (initial screening – routing – fine-tuning), the dynamic injection mechanism of the context engineering module, task description guidance, and dedicated model routing by risk type—is highly consistent with the aforementioned content risk detection solution. Other details can be found in the section above regarding... Figures 1 to 3 And a description of the relevant implementation methods.
[0215] Specifically, Figure 5 The risk detection process for content publishers is illustrated, including the following steps:
[0216] Step S510: Input the target content to be detected and the user characteristics of the user who published it into the first large model to obtain a first prediction result, which indicates that the user who published it is a high-risk user, a low-risk user, or the risk status is pending. Step S520: In response to the first prediction result indicating that the risk status is pending, input the target content, user characteristics, and related risk discrimination rules into the second large model to obtain a second prediction result, which indicates that there is a user risk type to be detected or that the user who published it is a low-risk user. Step S530: In response to the second prediction result indicating that there is a user risk type to be detected, call the third large model corresponding to the user risk type to process the target content, user characteristics, and related context information to generate a target risk detection result.
[0217] It should be noted that, for the sake of clarity and understanding, the terms "first major model" and "first prediction result" are used above.
[0218] In this type of subject-based risk detection, the architecture of the context engineering module does not need to be changed. However, in the initial screening stage, it needs to dynamically retrieve and inject features related to the publishing subject through tool components (such as content reverse lookup and user profile query) to support the first major model's preliminary judgment on user risk tendencies. The retrieved user features may include: 1) Basic attributes: such as real-name authentication status, registration duration, and regional information; 2) Risk attributes: such as the number of historical violations, the frequency of being reported, and account penalty records; 3) Behavioral sequences: such as recent posting frequency, content modification frequency, and private message / transfer behavior patterns; 4) Related features: such as device fingerprint reuse, multi-account binding relationships, and social graph anomalies.
[0219] These features are acquired in real time by tools such as "content reverse lookup" in the context engineering module, and are input into the first major model along with the target content as structured context, so that risk prediction can be made by integrating the main dimension information in the initial screening stage.
[0220] Although the input elements and judgment targets have been adjusted, the entire processing flow still strictly follows... Figure 1 The illustrated "light-medium-heavy" three-tiered cascaded architecture: the initial screening stage quickly filters low-risk users; the triage stage identifies potential risk types based on rules and knowledge; and the detailed judgment stage calls upon the corresponding dedicated model, combining the complete context to create a high-confidence risk profile of the subject. Therefore, this migration is not an architectural reconstruction, but rather a natural extension and reuse of the original risk control system's judgment dimensions.
[0221] For information on quality inspection attribution and model retraining, please refer to the aforementioned introduction on risk control of the content itself, which will not be repeated here. Furthermore, in actual business operations, risk detection of both the target content itself and the publishing user can be performed in parallel for the target content published by the user.
[0222] Based on the risk control solution targeting the aforementioned content publishers, it can bring the following beneficial effects in actual business: 1) Enhanced foresight in risk identification: By integrating user behavior sequences and profile features, potential high-risk entities can be identified before content violations become apparent, enabling early warning and proactive intervention. 2) Enhanced coverage of adversarial risks: Combining entity intent with contextual behavior patterns, it effectively identifies hidden violations that circumvent keyword detection, reducing the false negative rate. 3) Optimized resource utilization efficiency: Reusing the three-level cascaded architecture, it triggers high-cost precision judgment only for suspicious entities, balancing system throughput and judgment accuracy. 4) Support for refined governance strategies: Outputting structured entity risk tags provides a basis for tiered handling (such as traffic limiting, freezing, and monitoring), improving the balance between risk control and user experience. 5) Strengthened closed-loop evolution capability of the system: Entity risk samples can drive the collaborative iteration of rules, knowledge, and models, continuously improving the system's adaptability and robustness to new risks.
[0223] In summary, extending the multi-stage collaborative risk control architecture to the main risk detection not only expands the application boundaries of the original technical solutions, but also achieves a leap from "treating the symptoms" to "treating the root cause" in practical application, providing key support for building a smart, efficient, and explainable next-generation risk control system for large-scale Internet platforms.
[0224] Corresponding to the risk detection methods described above, the embodiments of this specification also disclose risk detection devices.
[0225] Figure 6 The content risk detection device 600 shown includes the following functional units:
[0226] The first prediction unit 610 is configured to input the target content to be detected into a first large model to obtain a first prediction result, which indicates that the target content is high-risk, low-risk, or has an undetermined risk status. The second prediction unit 620 is configured to, in response to the first prediction result indicating an undetermined risk status, input the target content and related risk discrimination rules into a second large model to obtain a second prediction result, which indicates a risk type to be detected or that the target content is low-risk. The third prediction unit 630 is configured to, in response to the second prediction result indicating a risk type to be detected, call a third large model corresponding to that risk type to process the target content and related context information, and generate a target risk detection result.
[0227] In one embodiment, the content risk detection device 600 further includes a model selection unit 640 configured to select the first largest model that supports the modality from the risk screening model set based on the modality of the target content.
[0228] In one embodiment, the content risk detection device 600 further includes a content processing unit 650 configured to allow the publication / interception of the target content in response to the first prediction result indicating high risk / low risk.
[0229] In one embodiment, the second prediction unit 620 is specifically configured to: respond to the first prediction result indicating that the risk status of several risk types is pending, retrieve a preset risk discrimination rule library based on the several risk types to obtain the corresponding risk discrimination rules; and input the target content and the retrieved risk discrimination rules into the second large model.
[0230] In one embodiment, the second prediction unit 620 is specifically configured to: respond to the first prediction result indicating that the risk status is pending, retrieve a preset risk discrimination rule library based on the target content to obtain relevant risk discrimination rules; and input the target content and the retrieved risk discrimination rules into the second large model.
[0231] In one embodiment, the content risk detection device 600 further includes a content processing unit that, in response to the second prediction result indicating low risk, allows the publication of the target content.
[0232] In one embodiment, the risk type to be detected is multiple risk types; the third prediction unit 630 is specifically configured to: call multiple third major models corresponding to the multiple risk types respectively, process the target content and related context information, obtain the risk detection results output by each of the third precision judgment models, and form the target risk detection result.
[0233] In one embodiment, determining the relevant context information includes: retrieving a preset risk discrimination rule base based on the risk type to be detected to obtain the corresponding risk discrimination rule; and / or, retrieving a preset domain knowledge base based on the target content to obtain relevant domain knowledge; and / or, obtaining historical submission content within a predetermined backtracking period before the target content was submitted in the business scenario to which the target content belongs; and / or, obtaining content published by the publisher of the target content in other business scenarios; and / or, obtaining user profile features and historical network behavior sequences of the publisher of the target content.
[0234] In one embodiment, the content risk detection device 600 further includes a quality inspection attribution unit 660, configured to: construct a target sample based on the target content and the target risk detection result to determine a historical sample set; determine misjudged samples from the historical sample set using a fourth major model; perform attribution analysis on the misjudged samples using a fifth major model; and process the analysis results of the attribution analysis using a sixth major model to obtain optimization suggestions for risk detection.
[0235] Furthermore, in a specific embodiment, determining the historical sample set includes: incorporating the target sample set into the initial sample set; performing semantic clustering on multiple detected contents involved in the initial sample set to obtain multiple clusters; sampling each of the multiple clusters, and incorporating the samples corresponding to the sampled contents into the historical sample set.
[0236] In another specific embodiment, the determination of the historical sample set includes: classifying the target sample set into the initial sample set; determining several groups of samples from the initial sample set that meet predetermined conditions, wherein the predetermined conditions are: the detected content of multiple samples within a group is semantically similar, but the risk detection results are different; and classifying the several groups of samples into the historical sample set.
[0237] In one specific embodiment, the quality inspection attribution unit 660 is configured to perform attribution analysis on the misjudged samples using a fifth major model. Specifically, this includes processing the misjudged samples and their corresponding contextual information using the fifth major model. This contextual information includes relevant knowledge retrieved from a knowledge base and background information searched using internet tools. The attribution analysis indicates that the cause of the misjudgment includes a lack of the background information, and the optimization suggestions include supplementing the knowledge base with knowledge extracted from the background information.
[0238] In one embodiment, the content risk detection device 600 further includes a retraining unit 670, configured to: construct target samples using the target content and target risk detection results, and include them in the original sample set; and select high-quality sample sets from the original sample set for training the third major model.
[0239] Furthermore, in a specific embodiment, the retraining unit 670 is specifically configured to: train the third model using the high-quality sample set and reinforcement learning; wherein the reward function involved in the reinforcement learning is in the form of a reward score table, where any i-th row and j-th column of the table represents the reward score for classifying a sample of the i-th risk type as the j-th risk type.
[0240] In one embodiment, the parameter sizes of the first, second, and third largest models increase progressively.
[0241] Figure 7 The main risk detection device 700 shown includes the following functional units:
[0242] The first prediction unit 710 is configured to input the target content to be detected and the user characteristics of the user who published it into a first large model to obtain a first prediction result, which indicates that the user who published it is a high-risk user, a low-risk user, or a risk status pending. The second prediction unit 720 is configured to, in response to the first prediction result indicating a risk status pending, input the target content, user characteristics, and related risk discrimination rules into a second large model to obtain a second prediction result, which indicates a user risk type to be detected or that the user who published it is a low-risk user. The third prediction unit 730 is configured to, in response to the second prediction result indicating a user risk type to be detected, call a third large model corresponding to that user risk type to process the target content, user characteristics, and related contextual information to generate a target risk detection result.
[0243] In one embodiment, the subject risk detection device 700 further includes a model selection unit 740, which selects the first largest model supporting the modality from the risk screening model set based on the modality of the target content. It should be understood that user characteristics are generally structured or semi-structured data and do not require consideration of modality.
[0244] In one embodiment, the subject risk detection device 700 further includes a content processing unit 750 configured to allow the publication / blocking of the target content in response to the first prediction result indicating a high-risk user / low-risk user.
[0245] In one embodiment, the second prediction unit 720 is specifically configured to: respond to the first prediction result indicating that the risk status of several risk types is pending, retrieve a preset risk discrimination rule library based on the several risk types to obtain the corresponding risk discrimination rules; and input the target content and the retrieved risk discrimination rules into the second large model.
[0246] In one embodiment, the second prediction unit 720 is specifically configured to: respond to the first prediction result indicating that the risk status is pending, retrieve a preset risk discrimination rule library based on the target content to obtain relevant risk discrimination rules; and input the target content and the retrieved risk discrimination rules into the second large model.
[0247] In one embodiment, the subject risk detection device 700 further includes a content processing unit configured to allow the publication of the target content in response to the second prediction result indicating a low-risk user.
[0248] In one embodiment, the risk type to be detected is multiple risk types; the third prediction unit 730 is specifically configured to: call multiple third major models corresponding to the multiple risk types respectively, process the target content, user characteristics and related context information, obtain the risk detection results output by each of the third precision judgment models, and form the target risk detection result.
[0249] In one embodiment, determining the relevant context information includes: retrieving a preset risk discrimination rule base based on the risk type to be detected to obtain the corresponding risk discrimination rule; and / or, retrieving a preset domain knowledge base based on the target content to obtain relevant domain knowledge; and / or, obtaining historical submission content within a predetermined backtracking period before the target content was submitted in the business scenario to which the target content belongs; and / or, obtaining content published by the publisher of the target content in other business scenarios; and / or, obtaining user profile features and historical network behavior sequences of the publisher of the target content.
[0250] In one embodiment, the main risk detection device 700 further includes a quality inspection attribution unit 760, configured to: construct a target sample based on the target content, user characteristics, and target risk detection results to determine a historical sample set; determine misjudged samples from the historical sample set using a fourth major model; perform attribution analysis on the misjudged samples using a fifth major model; and process the analysis results of the attribution analysis using a sixth major model to obtain optimization suggestions for risk detection.
[0251] Furthermore, in a specific embodiment, determining the historical sample set includes: incorporating the target sample set into the initial sample set; performing semantic clustering on multiple detected content-user features involved in the initial sample set to obtain multiple clusters; sampling each of the multiple clusters, and incorporating the samples corresponding to the sampling results into the historical sample set.
[0252] In another specific embodiment, the determination of the historical sample set includes: classifying the target sample set into an initial sample set; determining several groups of samples from the initial sample set that meet predetermined conditions, wherein the predetermined conditions are: the detected content-user feature semantics of multiple samples within a group are similar, but the risk detection results are different; and classifying the several groups of samples into the historical sample set.
[0253] In one specific embodiment, the quality inspection attribution unit 760 is configured to perform attribution analysis on the misjudged sample using the fifth major model, including: processing the misjudged sample and corresponding context information using the fifth major model, wherein the context information includes relevant knowledge retrieved from the knowledge base and background information searched using Internet tools; wherein the attribution analysis indicates that the cause of the misjudgment includes the lack of the background information, and the optimization suggestion includes supplementing the knowledge extracted from the background information into the knowledge base.
[0254] In one embodiment, the subject risk detection device 700 further includes a retraining unit 770, configured to: construct target samples using the target content, user features, and target risk detection results, and include them in the original sample set; and select high-quality sample sets from the original sample set for training the third major model.
[0255] Furthermore, in a specific embodiment, the retraining unit 670 is specifically configured to: train the third model using the high-quality sample set and reinforcement learning; wherein the reward function involved in the reinforcement learning is in the form of a reward score table, where any i-th row and j-th column of the table represents the reward score for classifying a sample of the i-th risk type as the j-th risk type.
[0256] In one embodiment, the parameter sizes of the first, second, and third largest models increase progressively.
[0257] It should be noted that for a description of the above functional units, please refer to the relevant description of the process method in the foregoing embodiments.
[0258] In this specification, Large Language Models (LLMs) may also be referred to simply as Large Models. A Large Language Model is a natural language processing model based on deep learning techniques, typically with billions to hundreds of billions or even more parameters, possessing powerful language understanding and generation capabilities. Large Language Models can employ the Transformer architecture or its variants (such as GPT, BERT, etc.), which utilizes an attention mechanism to globally model sequential data, efficiently handling long-distance dependencies and thus performing exceptionally well in natural language tasks. Large Language Models learn the statistical features and semantic relationships of language through pre-training on large-scale corpora, giving them excellent generalization capabilities. The core capabilities of Large Language Models include, but are not limited to: understanding contextual semantics, generating coherent and grammatically correct text, performing logical reasoning, and handling multi-task scenarios. Their usage typically includes two modes: direct inference and fine-tuning. In direct inference mode, the user guides the Large Language Model to generate specific outputs by designing prompts. Prompts can be task descriptions or instructions in text form, used to stimulate the Large Language Model's semantic understanding and generation capabilities. In fine-tuning mode, large language models are further trained on small-scale datasets within a specific domain to optimize their performance on specific tasks. The powerful generalization capabilities and flexibility of large language models make them an important tool in the field of artificial intelligence, providing efficient and accurate solutions for automated text generation and understanding.
[0259] In some embodiments, large language models can also understand and generate data from other modalities (such as visual and audio data). In this case, large language models can also be called multimodal large language models (MLLMs). MLLMs provide a richer and more natural interactive experience by integrating multiple types of input and output, such as text, images, and sound. The core advantage of MLLMs lies in their ability to process and understand information from different modalities and fuse this information to complete complex tasks. For example, MLLMs can analyze an image and generate descriptive text, or generate a corresponding image based on a text description. This cross-modal understanding and generation capability makes MLLMs widely applicable across multiple fields.
[0260] It should be noted that the key technologies of large language models can be found in the detailed description in the paper "A Survey of Large Language Models" (paper number: arXiv:2303.18223v16, published on March 11, 2025), and will not be repeated here.
[0261] According to another embodiment, a computer-readable storage medium is also provided, on which a computer program is stored, which, when executed in a computer, causes the computer to perform... Figure 3 or Figure 4 or Figure 5 The method described.
[0262] According to another embodiment, a computing device is also provided, including a memory and a processor, wherein the memory stores executable code, and the processor executes the executable code to implement... Figure 3 or Figure 4 or Figure 5 The method described.
[0263] Those skilled in the art will recognize that, in one or more of the examples above, the functions described in this invention can be implemented using hardware, software, firmware, or any combination thereof. When implemented in software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium.
[0264] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made on the basis of the technical solution of the present invention should be included within the scope of protection of the present invention.
Claims
1. A risk detection method, comprising: The target content to be detected is input into the first large model to obtain the first prediction result, which indicates that the target content is high-risk, low-risk, or the risk status is undetermined. In response to the first prediction result indicating that the risk status is pending, the target content and the relevant risk discrimination rules are input into the second model to obtain a second prediction result, which indicates the type of risk to be detected or that the target content is low risk. In response to the second prediction result indicating the type of risk to be detected, the third major model corresponding to the risk type is invoked to process the target content and related context information to generate the target risk detection result.
2. The method according to claim 1, wherein, Before inputting the target content to be detected into the first large model, the method further includes: Based on the modality of the target content, the first largest model that supports the modality is selected from the risk screening model set.
3. The method according to claim 1, further comprising: In response to the first prediction result indicating high risk / low risk, the publication / interception of the target content is permitted.
4. The method according to claim 1, wherein, In response to the first prediction result indicating an undetermined risk status, the target content and related risk discrimination rules are input into the second major model, including: In response to the first prediction result indicating that the risk status of several risk types is pending, a preset risk discrimination rule base is retrieved based on the several risk types to obtain the corresponding risk discrimination rules; The target content and the retrieved risk discrimination rules are input into the second major model.
5. The method according to claim 1, wherein, In response to the first prediction result indicating an undetermined risk status, the target content and related risk discrimination rules are input into the second major model, including: In response to the first prediction result indicating an undetermined risk status, a preset risk discrimination rule base is retrieved based on the target content to obtain relevant risk discrimination rules; The target content and the retrieved risk discrimination rules are input into the second major model.
6. The method of claim 1, further comprising: In response to the second prediction indicating low risk, the publication of the target content is permitted.
7. The method according to claim 1, wherein, The risk types to be detected are multiple risk types; among them, generating the target risk detection result includes: Each of the three third-level models corresponding to the multiple risk types is invoked to process the target content and related context information, thereby obtaining the risk detection results output by each of the three-level precision judgment models and forming the target risk detection results.
8. The method according to claim 1, wherein, Determining the relevant context information includes: Based on the type of risk to be detected, a preset risk discrimination rule base is retrieved to obtain the corresponding risk discrimination rule; and / or, Based on the target content, a preset domain knowledge base is retrieved to obtain relevant domain knowledge; and / or, Obtain the historical submission content within a predetermined backtracking period prior to the submission of the target content within the business scenario to which the target content belongs; and / or, Obtain content published by the publisher of the target content in other business scenarios; and / or, Obtain the user profile characteristics and historical online behavior sequence of the publisher of the target content.
9. The method according to claim 1, wherein, After generating the target risk detection results, the method further includes: Based on the target content and target risk detection results, a target sample is constructed to determine the historical sample set; Based on the historical sample set, the fourth major model is used to identify misjudged samples. The fifth major model was used to perform attribution analysis on the misjudged samples; The results of the attribution analysis are processed using the sixth major model to obtain optimization suggestions for risk detection.
10. The method according to claim 9, wherein, The determination of the historical sample set includes: The target sample set is incorporated into the initial sample set; Semantic clustering is performed on the multiple detected contents involved in the initial sample set to obtain multiple clusters; The samples corresponding to the sampled content are then included in the historical sample set.
11. The method according to claim 9, wherein, The determination of the historical sample set includes: The target sample set is incorporated into the initial sample set; Several groups of samples that meet predetermined conditions are determined from the initial sample set. The predetermined conditions are: the detected content of multiple samples in a group is semantically similar, but the risk detection results are different. The aforementioned groups of samples are categorized into the historical sample set.
12. The method according to claim 9, wherein, The fifth model was used to perform attribution analysis on the misclassified samples, including: The fifth model is used to process the misjudged samples and their corresponding context information, which includes relevant knowledge retrieved from the knowledge base and background information searched using internet tools. The attribution analysis indicates that the reasons for misjudgment include the lack of background information, and the optimization suggestions include supplementing the knowledge base with knowledge extracted from the background information.
13. The method according to claim 1, wherein, After generating the target risk detection results, the method further includes: Target samples are constructed using the target content and target risk detection results, and then incorporated into the original sample set. A high-quality sample set is selected from the original sample set for training the third major model.
14. The method according to claim 13, wherein, After selecting a high-quality sample set from the original sample set, the method further includes: The third model is trained using reinforcement learning with the high-quality sample set mentioned above. The reward function involved in the reinforcement learning is in the form of a reward score table, where any row i and column j in the table represents the reward score for classifying a sample of risk type i as risk type j.
15. The method according to claim 1, wherein, The parameter sizes of the first, second, and third largest models increase progressively.
16. A risk detection method, comprising: The target content to be detected and the user characteristics of the user who published it are input into the first model to obtain the first prediction result, which indicates whether the user who published it is a high-risk user, a low-risk user, or a risk status pending. In response to the first prediction result indicating that the risk status is pending, the target content, user characteristics and related risk discrimination rules are input into the second model to obtain the second prediction result, which indicates the user risk type to be detected or that the publishing user is a low-risk user. In response to the second prediction result indicating the type of user risk to be detected, the third major model corresponding to the user risk type is invoked to process the target content, user characteristics and related context information to generate the target risk detection result.
17. A risk detection device, comprising: The first prediction unit is configured to input the target content to be detected into the first large model to obtain the first prediction result, which indicates that the target content is high-risk, low-risk, or the risk status is pending. The second prediction unit is configured to, in response to the first prediction result indicating that the risk status is pending, input the target content and related risk discrimination rules into the second large model to obtain a second prediction result, which indicates the type of risk to be detected or that the target content is low risk. The third prediction unit is configured to, in response to the second prediction result indicating the type of risk to be detected, call the third major model corresponding to the risk type to process the target content and related context information, and generate a target risk detection result.
18. A risk detection device, comprising: The first prediction unit is configured to input the target content to be detected and the user characteristics of the user who published it into the first large model to obtain the first prediction result, which indicates that the user who published it is a high-risk user, a low-risk user, or a risk status pending. The second prediction unit is configured to, in response to the first prediction result indicating that the risk status is pending, input the target content, user characteristics and related risk discrimination rules into the second large model to obtain a second prediction result, which indicates the user risk type to be detected or that the publishing user is a low-risk user. The third prediction unit is configured to, in response to the second prediction result indicating the type of user risk to be detected, call the third major model corresponding to the user risk type to process the target content, user characteristics and related context information, and generate a target risk detection result.
19. A computer-readable storage medium having a computer program stored thereon, wherein, When the computer program is executed in the computer, it causes the computer to perform the method of any one of claims 1-16.
20. A computing device comprising a memory and a processor, wherein, The memory stores executable code, and when the processor executes the executable code, it implements the method of any one of claims 1-16.