A media content compliance review system and method based on legal rules
By breaking down media content into nodes and constructing a dependency structure, the problem of insufficient legal semantic compliance identification in existing technologies has been solved, achieving efficient and accurate compliance review and improving compliance identification and risk identification capabilities.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HEILONGJIANG VOCATIONAL INST OF ECOLOGICAL ENG
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-30
AI Technical Summary
Existing media content compliance review systems struggle to accurately identify compliance at the legal semantic level, especially when it comes to legal rules. They are unable to effectively determine whether content contains misleading statements about legal obligations and the boundaries of rights, leading to misjudgments of compliant content or the existence of compliance risks.
By breaking down media content to be reviewed into nodes at the paragraph and syntactic levels, marking subjects, behaviors, and contextual elements, and combining the scope of application and sensitivity of regulations, a set of legal elements is formed. Through cross-paragraph dependency checks, nested case law structures, and citation relationships of provisions, a dependency structure is constructed to identify potential overstepping of obligations and deviations from legal concepts, generating standardized compliance reference text fragments, and finally conducting hierarchical compliance review.
It has improved the structure and accuracy of media content compliance review, reduced the subjectivity of manual review, enhanced the ability to identify potential violations, and ensured legal compliance before media content is published.
Smart Images

Figure CN122309778A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of media censorship, specifically to a media content compliance censorship system and method based on legal rules. Background Technology
[0002] With the rapid development of short video platforms, information aggregation platforms, and corporate self-media matrices, media content typically needs to undergo compliance review before publication to prevent violations of laws, regulations, policies, or industry norms. Existing media content compliance review systems mostly use keyword matching or content identification methods based on statistical models, which have a certain ability to identify obviously non-compliant expressions. However, in practical applications, media content often involves general descriptions or paraphrases of legal provisions, policy clauses, or judicial precedents. Its compliance is not determined solely by a single sensitive word, but is closely related to the specific legal rules, the relationships between the subjects, and the context of the expression.
[0003] For example, when companies publish articles interpreting regulatory policies, the same legal concept may have different compliance boundaries under different applicable conditions. Existing review methods, due to a lack of characterization of the internal structure and applicable logic of legal rules, often can only make static comparisons of the surface text, making it difficult to determine whether the content contains misleading statements about legal obligations and rights boundaries. This can lead to compliant content being misjudged and blocked, or content with compliance risks being released.
[0004] Therefore, it is essential to design a media content compliance review system and method based on legal rules that enhances the ability to identify compliance at the legal semantic level. Summary of the Invention
[0005] To address the shortcomings of existing technologies, this invention provides a media content compliance review system and method based on legal rules, which has the advantage of improving the ability to identify compliance at the legal semantic level and solves the problems mentioned in the background technology.
[0006] To achieve the aforementioned goal of enhancing compliance identification capabilities at the legal semantic level, this invention provides the following technical solution: a media content compliance review method based on legal rules, comprising the following steps: The content of the media to be censored is broken down into nodes at the paragraph and syntactic levels. Subjects, behaviors and contextual elements are marked with uncertain areas of context, and a set of legal elements is formed by combining the scope of application of the regulations and the degree of sensitivity. Cross-paragraph dependency checks and multi-level matching are performed on the set of legal elements. By combining the nested structure of case precedents, the relationship between citations of articles and the logical relationship of context, a dependency structure for content compliance behavior is established. Based on the dependency structure, constraints are filled in for missing, ambiguous or logically incomplete legal components in the text. Standardized compliance reference text fragments are generated through rule unit consistency verification, paragraph logical connectivity analysis and boundary consistency detection. The text to be reviewed is matched and weighted with standardized compliance reference text fragments in a hierarchical manner to calculate the subject correspondence, behavioral component matching degree, applicable situation fit and legal provision boundary compliance, and form a compliance score; Based on compliance scores and applicable rules, potential overstepping of obligations, deviations from legal concepts, and interpretational biases are identified. The compliance level and review classification are dynamically determined by combining the weights of rule units, generating a graded compliance review result.
[0007] Preferably, the process of forming a set of legal elements is as follows: For censored media content, a multi-level text node structure is established according to paragraphs, sentences, and clauses. Extract the main object, behavior, and contextual constraints for each node, and mark the syntactic dependencies and semantic relationships between nodes; Based on the scope of application information in the current legal provisions database, a regulatory adaptation analysis is conducted on the subject categories, behavior types, and dissemination contexts involved in the nodes. At the same time, nodes are classified into risk levels based on preset sensitivity indicators, forming a node attribute labeling table; The node attribute tagging table is integrated with the syntactic dependency relationship to identify uncertain regions in the text that have semantic jumps, ambiguous referents or incomplete contexts, and a node feature matrix is constructed based on the semantic correlation between nodes. The legal attributes of each node are aggregated based on the node feature matrix to generate a set of legal elements containing subject elements, behavioral components, contextual limitations, and risk level information.
[0008] Preferably, the process of performing cross-paragraph dependency checks and multi-level matching on the set of legal elements is as follows: The node feature matrix in the legal element set is indexed and sorted according to paragraph order, and nodes with subject association, behavioral causality or situational continuity relationship in different paragraphs are initially screened. Based on the indicators of subject consistency, behavioral logic relevance, and contextual continuity, multi-level matching calculations are performed on the selected nodes to identify legal components in the text that have cross-paragraph dependencies. By evaluating the logical consistency of the matching results, the main relationships, behavioral causal relationships, and contextual continuity relationships between different paragraph nodes are identified. Based on the identified relationships, establish node connection edges to construct a cross-paragraph dependency graph.
[0009] The preferred process for establishing a dependency structure for content compliance behavior is as follows: Legal provisions are mapped to each node in the cross-segment dependency graph. The subject type, behavior type and situation category involved in the node are matched with the legal provisions database to determine the set of applicable legal provisions. By combining the judgment logic in the historical case database, cases with similar behavioral patterns or legal application scenarios are retrieved, and the nested structure and judgment conditions of the cases are extracted. By jointly mapping the relationship between provisions, nested case structures, and logical relationships between nodes, a multi-dimensional dependency network is constructed that describes the relationship between subject behavior and legal constraints. By analyzing the node connection strength and path structure in the dependency network, key legal constraint nodes and core logical paths are identified, forming a dependency structure for content compliance behavior.
[0010] Preferably, the process of filling in the constraints for missing, ambiguous, or logically incomplete legal components in the text is as follows: Identify nodes in the dependency structure that do not form a complete logical link, including those with missing behavioral premises and legal components with unclear definitions of subject responsibility; Based on the key constraint nodes and applicable conditions of the identified nodes, supplementary constraints are extracted from the legal provisions library and case law database, and a candidate list of legal components for completion is generated. The logical consistency of the supplementary components in the candidate legal component supplementation list is screened to determine the consistency between the supplementary components and the original nodes in terms of subject relationship, behavioral logic and applicable scope of the situation; The selected completion components are embedded into the dependency structure, and the legal attributes and logical relationships of the relevant nodes are updated to generate an updated legal component structure table.
[0011] Preferably, the process of generating standardized compliance reference text fragments is as follows: Perform a rule unit consistency check on the updated legal component structure table to check whether each component meets the applicable conditions of the corresponding legal provisions and the case interpretation rules; Based on the dependency structure, a connectivity analysis is performed on the logical connection relationships between the nodes of each paragraph to determine whether the logical order between the main behavior and the contextual constraints is complete. By using boundary consistency detection, adjustments are made to the semantic connections between paragraphs, the boundaries of the scope of legal application, and the boundaries of behavioral descriptions; Based on the verification and adjustment results, standardized compliance reference text fragments covering subjects, behaviors, contexts, and legal constraints are generated.
[0012] The preferred process for generating compliance scores is as follows: The text to be reviewed is mapped and matched with standardized compliance reference text fragments at multiple levels, including paragraph, sentence, and legal component levels, and the semantic similarity and structural consistency indicators between each level are extracted. Based on the correspondence between subjects, the consistency of behavioral descriptions, and the applicable conditions of the context, the subject correspondence degree, behavioral component matching degree, and context fit degree indicators are calculated, and the weights of each indicator are assigned. By combining the legal citation boundaries in the text, the matching results are checked for compliance with the legal citation boundaries to determine whether the behavioral description exceeds or deviates from the scope of application of the regulations. A comprehensive compliance score is calculated based on the above-mentioned indicators, and a corresponding compliance score is generated.
[0013] Preferably, the process for identifying potential overstepping of obligations, deviations in legal concepts, and interpretational biases is as follows: Threshold determination is performed on each dimension indicator in the compliance score to identify nodes where the subject correspondence, behavior matching, or situation fit is lower than the preset threshold. Perform legal rule back-analysis on the identified low-matching nodes to determine whether there is an expansion of the scope of obligations or a conflation of legal concepts; Based on the applicable conditions of the rules and historical review cases, the risk levels of deviation nodes are classified, and a corresponding deviation node marking table is generated.
[0014] Preferably, the process for generating tiered compliance review results is as follows: The overall compliance risk index is calculated based on the risk level and rule unit weight of each node in the deviation node labeling table. Based on the compliance risk index and preset grading standards, the content to be reviewed is graded and judged. Based on the risk level corresponding to each node, risk markers are generated for each paragraph and node of the text, and a review report containing the text location, risk type, and suggested modification direction is output. The review results from each node are aggregated to generate the final tiered compliance review results.
[0015] A media content compliance review system based on legal rules, comprising: Element Analysis Module: This module breaks down the media content to be reviewed into nodes and marks the subject, behavior, and contextual elements to generate a set of legal elements. Dependency building module: Performs cross-paragraph matching on the set of legal elements and establishes a content compliance dependency structure by combining case law and the relationship between provisions; Rule completion module: Based on the dependency structure, it completes the rules for missing or incomplete legal components and generates standardized compliance reference text fragments; The compliance assessment module performs hierarchical matching between the text to be reviewed and the reference text, and calculates various compliance indicators to form a compliance score. Risk assessment module: Identify deviation points based on compliance scores and determine the corresponding compliance level and review classification results.
[0016] Compared with existing technologies, this invention provides a media content compliance review system and method based on legal rules, which has the following beneficial effects: This invention transforms unstructured text content into a set of elements with legal semantic relationships by performing node-based segmentation at the paragraph and syntactic levels and structurally marking subject, behavior, and contextual elements. This improves the accuracy of text parsing. Furthermore, it constructs a dependency structure for content compliance behavior through cross-segment dependency checks, combined with case law nesting, article citation relationships, and contextual logic. This establishes a clear connection between behavioral logic and legal norms within the text. Further, it fills in missing, ambiguous, or logically incomplete legal components with constraints and generates standardized compliance reference text fragments by combining rule unit consistency checks, paragraph logical connectivity analysis, and boundary consistency detection. This provides a basis for subsequent... The review process provides a unified compliance comparison benchmark. Subsequently, through hierarchical matching and weighted comparison of the text to be reviewed and the reference text, it comprehensively calculates the subject correspondence, behavioral component matching, contextual fit, and legal boundary compliance to achieve a quantitative assessment of the media content's compliance level. Finally, it identifies potential risk points such as overstepping obligations, deviations from legal concepts, and interpretation biases based on the applicable rules, and dynamically determines the compliance level according to the rule unit weights, thus forming a tiered compliance review result. Through the above technical solution, this invention can effectively improve the structure and accuracy of media content compliance review, reduce the subjectivity and workload of manual review, enhance the ability to identify potential violations, and thus strengthen the legal compliance guarantee capability before media content is published. Attached Figure Description
[0017] Figure 1 This is a schematic diagram of the method of the present invention; Figure 2 This is a schematic diagram of the structure of the present invention. Detailed Implementation
[0018] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0019] Example 1: Please refer to Figure 1 As shown in the figure, a media content compliance review method based on legal rules in an embodiment of the present invention includes the following steps: S1: The content of the media to be reviewed is broken down into nodes at the paragraph and syntactic levels. Subjects, behaviors and contextual elements are marked with uncertain areas of context, and a set of legal elements is formed by combining the scope of application of the regulations and the degree of sensitivity.
[0020] The process of forming a set of legal elements in S1 is as follows: The text node structure for media content to be reviewed is established according to paragraph, sentence, and clause levels. The text content to be reviewed is imported into the text processing system, and the text is preprocessed by the natural language processing module, including removing redundant format symbols, unifying the encoding format, and performing sentence segmentation. The text is divided into first-level nodes according to paragraph boundaries. Within each paragraph, sentences are split into second-level nodes according to punctuation marks such as periods, question marks, or semicolons. Then, the clause structure within sentences is identified using dependency parsing algorithms, and third-level nodes are established according to the main-subordinate relationship. Each node is assigned a unique node number, and its paragraph position, sentence position, and clause level are recorded, thereby constructing a hierarchical text node structure tree. For each node, the main object, actions, and contextual constraints are extracted, and the syntactic dependencies and semantic orientations between nodes are marked. Key semantic elements in each node are extracted, where the main object is determined by identifying entity categories such as institution name, individual identity, regulatory body, or social organization; actions are extracted by verb part-of-speech recognition and behavior dictionary matching; and contextual constraints are obtained by identifying semantic components such as time, place, conditional adverbs, and policy background descriptions. While extracting elements, the subject-predicate, verb-object, and modification relationships between different nodes are determined through dependency parsing. Furthermore, the semantic reference resolution algorithm is combined to identify the real objects pointed to by pronouns or general expressions, thereby structurally marking the syntactic dependencies and semantic orientations between nodes to form a semantic association graph that reflects the internal logical relationships of the text. By combining the scope of application information in the existing legal provisions database, a regulatory adaptation analysis is conducted on the subject categories, behavior types, and dissemination contexts involved in the nodes. A legal provisions database containing legal provisions, regulatory policies, and industry norms is pre-constructed, and each legal rule in the database is labeled with the applicable subject type, behavior norm type, and applicable scenario conditions. The subject category, behavior type, and context information extracted from the nodes are matched with the legal provisions database. Through keyword matching, semantic similarity calculation, and rule filtering, potentially applicable legal rules are screened out, and the degree of adaptation between the nodes and the corresponding legal rules is calculated, thereby determining the scope of legal norms that each node may involve. This clarifies the regulatory rules corresponding to the specific expressions in the media content at the legal semantic level. Simultaneously, nodes are classified into risk levels based on preset sensitivity indicators, forming a node attribute labeling table. A sensitivity evaluation index system is constructed based on the type of legal rules involved in the node, the sensitivity of the subject's identity, and the potential impact of the behavior. For example, content involving financial supervision, medical advertising, data security, or protection of minors is set as a higher sensitivity level. Sensitivity of each node can be calculated according to preset scoring rules, and nodes can be divided into low-risk, medium-risk, or high-risk levels. At the same time, information such as node number, subject category, behavior type, applicable laws and regulations, and risk level is recorded to generate a node attribute labeling table. By integrating the node attribute tagging table with syntactic dependencies, uncertain regions are identified in the text that contain semantic jumps, ambiguous referents, or incomplete contexts. A node feature matrix is constructed based on the semantic correlation between nodes. The node attribute tagging table is then fused with the previously established syntactic dependency graph. By analyzing the semantic connection between adjacent nodes and the consistency of subject behavior, possible semantic breaks, unclear referents, or incomplete context conditions in the text are identified. For example, if a node only describes the result of a behavior without specifying the subject or applicable conditions, it can be identified as an incomplete context region. Uncertainty is marked for these regions, and the node correlation is calculated based on the semantic similarity between nodes, the strength of the dependency relationship, and the consistency of regulatory adaptation. Finally, a node feature matrix reflecting node characteristics and correlations is constructed. The legal attributes of each node are aggregated based on the node feature matrix to generate a set of legal elements containing subject elements, behavioral components, contextual constraints, and risk level information. Nodes with high semantic relevance and similar legal attributes are merged through clustering analysis or rule aggregation algorithms. Nodes involving the same subject, the same behavior, or the same applicable legal context are integrated into a unified legal element unit. During the aggregation process, the subject category, behavior description, contextual constraints, and risk level information corresponding to each element unit are retained, and a mapping relationship with the original text nodes is established, thereby forming a structured set of legal elements.
[0021] S2: Perform cross-paragraph dependency checks and multi-level matching on the set of legal elements, and establish a dependency structure for content compliance behavior by combining the nested structure of case precedents, the relationship between citations of provisions and the logical relationship of context.
[0022] The process of performing cross-paragraph dependency checks and multi-level matching on the set of legal elements in S2 is as follows: The node feature matrix in the legal element set is indexed and sorted according to paragraph order. Nodes with subject association, behavioral causality, or situational continuity relationships in different paragraphs are initially screened. Based on the legal element set, the node feature matrix and corresponding paragraph number information are extracted. All nodes are uniformly indexed and sorted according to their paragraph order and appearance position in the original text to form a node sequence with temporal and structural order. Then, the subject category, behavioral components, and situational limitation information recorded in the node feature matrix are used to initially screen the association between nodes in different paragraphs. For example, when two nodes have the same or highly similar subject categories, they can be determined to have potential subject association. When the behavioral result of one node has a logical connection with the behavioral premise of another node, they can be determined to have potential behavioral causal relationship. When different nodes have continuous descriptions in terms of time conditions, policy background, or applicable scenarios, they can be identified as potential situational continuity relationships. Through the screening process, node pairs that may have cross-paragraph dependency relationships are extracted. Based on the indicators of subject consistency, behavioral logic relevance, and contextual continuity, multi-level matching calculations are performed on the selected nodes to identify legal components in the text that have cross-paragraph dependencies. The subject consistency indicator determines the degree of consistency by comparing the subject entity names, subject categories, and referential relationships in the nodes. The behavioral logic relevance indicator is calculated by analyzing the causal relationship, conditional trigger relationship, or result description relationship between behavioral actions. The contextual continuity indicator is obtained by semantic similarity analysis of information such as time conditions, applicable background, and policy context. The above indicators are comprehensively calculated according to preset weights to obtain the matching score between nodes. When the matching score exceeds the set threshold, it can be determined that the node has a cross-paragraph dependency at the legal semantic level, and the corresponding legal components can be further identified, thereby determining the real cross-paragraph legal logic structure in the text. By evaluating the logical consistency of the matching results, the system identifies the subject-related relationships, behavioral causal relationships, and contextual continuity relationships between nodes in different paragraphs. The system also evaluates the logical consistency of the matching results to confirm whether their dependencies conform to reasonable legal semantic logic. It checks the consistency of subject attributes in node pairs, such as determining whether subjects maintain the same identity category in the legal context. Furthermore, it analyzes whether there are reasonable causal or sequential relationships between behaviors through behavioral logic rules, such as whether one behavior is a prerequisite or legal consequence of another. It also verifies the continuity of contextual constraints, such as determining whether there are continuity relationships between time conditions, applicable policy background, and contextual descriptions. If all the above logical conditions are met simultaneously, it can be confirmed that there are clear subject-related relationships, behavioral causal relationships, or contextual continuity relationships between nodes, thereby further improving the accuracy of cross-paragraph dependency identification. Based on the identified relationships, node connection edges are established to construct a cross-paragraph dependency graph. After completing the logical consistency assessment and confirming the dependencies between nodes, each node can be regarded as a vertex in the graph structure. Corresponding connection edges are established between nodes based on the identified subject associations, behavioral causal relationships, and contextual continuation relationships. Different types of dependencies can be labeled with different edge attributes. All nodes and their connection relationships are integrated into a cross-paragraph dependency graph, and the relationship type and matching weight of each connection edge are recorded to reflect the logical structure between legal elements within the text.
[0023] The process of establishing the dependency structure for content compliance behavior in S2 is as follows: Legal provisions are mapped to each node in the cross-segment dependency graph. The subject type, behavior type, and situation category involved in the node are matched with the legal provisions database to determine the set of applicable legal provisions. Based on the cross-segment dependency graph, the attribute information such as subject type, behavior type, and situation category of each node in the graph is extracted. A legal provisions database containing laws, regulations, administrative rules, and industry standards is pre-established. Each legal provision in the database is labeled with its applicable subject category, standardized behavior type, and applicable situation conditions. The node attributes are compared with the labeled information in the legal provisions database using semantic matching and rule matching algorithms. The matching results are recorded as the set of legal provisions corresponding to the node. At the same time, the confidence score of the matching results can be calculated. Provisions with scores higher than a preset threshold are retained as the applicable legal basis for the node, thereby realizing the mapping relationship between the node's legal attributes and legal provisions. By combining the judgment logic in the historical case database, cases with similar behavioral patterns or legal application scenarios are retrieved, and the nested structure and judgment conditions of the cases are extracted. A case database containing historical judicial precedents, administrative penalty cases, and regulatory notice cases can be constructed. Each case is structured and labeled with information such as the case subject type, behavioral pattern, applicable legal provisions, and judgment conclusion. Based on the behavioral patterns and applicable legal provisions described by the nodes in the cross-segment dependency graph, similarity retrieval algorithms are used to search for cases with similar behavioral characteristics or legal application scenarios in the case database. For example, candidate cases are screened using behavioral keyword matching, semantic similarity calculation, and contextual label matching. The judgment logic structure of the retrieved cases is further extracted, including the order of legal provisions cited by the court in the judgment process, the conditions for key fact-finding, and the logical relationship between behavior and legal responsibility, thereby forming a nested structure and judgment conditions of cases that can reflect the path of legal application. By jointly mapping the citation relationships of legal provisions, the nested structure of case precedents, and the logical relationships of nodes, a multidimensional dependency network describing the relationship between subject behavior and legal constraints is constructed. Nodes, legal provisions, and case precedent logical units can be regarded as different types of nodes in the network, and connecting edges are established based on the citation relationships, behavioral logical relationships, and legal application relationships between nodes. For example, a legal constraint relationship is established between a node and a legal provision, and a case precedent reference relationship is established between a node and a case precedent logical unit. The original subject association, behavioral causal relationship, and situational continuity relationship between nodes are preserved, thus constructing a multidimensional dependency network containing three types of elements: text nodes, legal provisions, and case precedent logic. By analyzing the node connection strength and path structure in the dependency network, key legal constraint nodes and core logical paths are identified, forming a dependency structure for content compliance behavior. Structural analysis of the multidimensional dependency network is performed, evaluating the importance of each node by calculating the weights of connecting edges and path distribution. For example, weights are assigned to connecting edges based on the matching confidence of nodes with legal provisions, the frequency of case citations, and the semantic relevance between nodes. Graph structure analysis algorithms are used to calculate the centrality or path importance of nodes. When a node connects to multiple legal provisions or occupies a key position in multiple behavioral logical paths, it can be identified as a key legal constraint node. Further analysis of the main connection paths from behavioral nodes to legal constraint nodes extracts the core path structure that reflects the logical relationship between behavioral descriptions and legal norms, thus forming a dependency structure for content compliance behavior that describes the relationship between the subject's behavior and legal constraints in media content.
[0024] S3: Based on the dependency structure, it fills in the constraints of missing, ambiguous or logically incomplete legal components in the text, and generates standardized compliance reference text fragments through rule unit consistency verification, paragraph logical connectivity analysis and boundary consistency detection.
[0025] The process of filling in constraints for missing, ambiguous, or logically incomplete legal components in the text in S3 is as follows: In the dependency structure, nodes that do not form a complete logical link are identified, including legal components with missing behavioral premises and unclear definitions of subject responsibility. Based on the content compliance behavior dependency structure, each node and its upstream and downstream connection paths are traversed and analyzed. A graph structure traversal algorithm is used to check whether a complete legal logical link can be formed between nodes. For example, for a certain behavior node, it is necessary to determine whether there is a clear behavior subject node and a corresponding legal constraint node. If it is found that the behavior node does not point to a specific subject, or the subject node does not form a clear connection with the legal responsibility node, it can be determined that the node has a problem of unclear definition of subject responsibility. At the same time, it can also be determined whether the behavior description lacks necessary behavioral premises or applicable conditions based on legal logic rules. For example, if the text only describes the consequences of a certain behavior without specifying the triggering conditions, it can be identified as a legal component with missing behavioral premises. Each node in the dependency structure is checked one by one to form a list of nodes that do not form a complete logical link, providing a clear target for supplementing the constraint conditions. Based on the key constraint nodes and applicable conditions of the identified nodes, supplementary constraints are extracted from the legal provisions database and case law database, and a candidate list of legal component completions is generated. Based on the key legal constraint nodes connected to the node in the dependency structure and the legal provisions they have matched, the applicable conditions, behavioral prerequisites, or liability definition rules stipulated in the corresponding provisions are extracted from the legal provisions database. At the same time, cases similar to this type of behavior are searched in the case law database. By extracting the key conditions or factual elements cited by the courts in determining the liability for the behavior in the cases, supplementary conditions are formed. The various supplementary conditions extracted from the legal provisions database and case law database are integrated and classified according to subject conditions, behavioral prerequisite conditions, and situational applicable conditions to generate a candidate list of legal component completions, providing optional completion schemes for screening and embedding. The candidate legal component completion list is subjected to logical consistency screening to determine the consistency between the completion component and the original node in terms of subject relationship, behavioral logic, and applicable context. Each candidate completion component is subjected to rule matching and semantic consistency analysis. Subject consistency is used to determine whether the subject category in the completion component matches the subject type described by the original node. For example, if the subject of the original node is a corporate entity, the legal conditions applicable to corporate entities are retained first. Behavioral logic is used to determine whether the behavioral conditions described by the completion component can form a reasonable logical relationship with the behavior of the original node. At the same time, the applicable context is also verified. For example, it is determined whether the industry scenario, dissemination method, or policy background to which the supplementary conditions are applicable is consistent with the context of the original node. Completion components that do not meet the consistency conditions are eliminated, and only completion components that are consistent in subject relationship, behavioral logic, and applicable context are retained, thus forming a logically consistent set of completion components. The selected supplementary components are embedded into the dependency structure, and the legal attributes and logical relationships of the relevant nodes are updated to generate an updated legal component structure table. The retained supplementary components are then embedded into the original dependency structure according to their corresponding node positions. For example, the supplementary behavioral premise node is inserted before the behavioral node, or the supplementary liability definition node is connected between the corresponding subject node and the legal provision node, thereby supplementing the originally incomplete legal logic link. During the embedding process, the attribute information of the relevant nodes is updated simultaneously, including the applicable legal provisions, the category of the responsible subject, and the applicable conditions of the behavior, etc. The connection relationship and dependency path between nodes are recalculated, and the legal component structure table is regenerated based on the updated dependency structure, making the legal logic structure in the media content more complete and clear.
[0026] The process of generating standardized compliance reference text fragments in S3 is as follows: The updated legal component structure table undergoes a rule unit consistency check to verify whether each component conforms to the applicable conditions of the corresponding legal provisions and the case interpretation rules. For the updated legal component structure table, each legal component unit is compared item by item with its corresponding legal provisions and case interpretation information. The system pre-establishes a rule unit library in the legal provisions database, where each rule unit includes the applicable subject category, behavioral norm requirements, applicable situational conditions, and corresponding legal liability boundaries. Simultaneously, the case database records the courts' interpretation rules of the applicable conditions of the provisions in similar cases. A rule matching algorithm is used to verify the subject elements, behavioral components, and situational conditions in the legal component structure table. Based on the dependency structure, connectivity analysis is performed on the logical connections between the nodes of each paragraph to determine whether the logical order between the main behavior and the contextual constraints is complete. Combining the content compliance behavior dependency structure, connectivity analysis is performed on the connections between each node. By traversing the node connection paths in the dependency structure, it is identified whether a complete logical link is formed between the main node, the behavior node, and the context node. For example, it is determined whether the main node explicitly points to the corresponding behavior node, whether the behavior node is logically constrained by the contextual constraint node, and whether the behavior node forms a complete legal logical path with the legal responsibility node. If a node is found to lack necessary preceding nodes or subsequent constraint nodes during the analysis process, such as when the behavior node does not correspond to a specific subject or does not specify the applicable context, it is marked as a node with incomplete logical connection, and necessary supplementation is performed by calling the aforementioned completion mechanism. By performing boundary consistency detection, the semantic connection, legal scope boundaries, and behavioral description boundaries between paragraphs are adjusted. Boundary consistency detection is performed on the semantic connection relationships between text paragraphs to avoid problems such as confusion in legal scope or repetition of behavioral descriptions between different paragraphs. Semantic similarity analysis algorithms and rule matching algorithms can be used to compare adjacent paragraph nodes to check whether the same behavior is assigned different legal scopes in different paragraphs, or whether a legal provision is incorrectly extended to inapplicable subjects or situations in different nodes. Simultaneously, behavioral description boundaries are detected, such as identifying whether the same behavior is described repeatedly in multiple nodes or if logical conflicts occur. When the above problems are detected, corrections can be made by adjusting the semantic connection between nodes, redefining the scope of behavioral descriptions, or clarifying legal scope boundaries, making the semantic connection between different paragraphs more natural and ensuring that each behavioral description is within a reasonable legal scope. Based on the verification and adjustment results, standardized compliance reference text fragments covering subjects, behaviors, contexts, and legal constraints are generated. According to the updated legal component structure table and dependency structure, nodes conforming to the logical structure are reorganized and text generated. Nodes are combined according to the structural order of subject—behavior—context—legal constraint, and the structured information in the nodes is converted into natural language descriptions using a preset text generation template. During the generation process, duplicate nodes are merged, and contextual conditions are appropriately supplemented, ensuring that the generated text maintains both legal logical integrity and clear semantic expression. This forms a set of standardized compliance reference text fragments that accurately reflect compliance expressions under specific legal rules and contextual conditions.
[0027] S4: Perform hierarchical matching and weighted comparison between the text to be reviewed and the standardized compliance reference text fragments, calculate the subject correspondence, behavioral component matching, applicable context fit and legal provision boundary compliance, and form a compliance score.
[0028] The process of generating compliance scores in S4 is as follows: The text to be reviewed is mapped and matched with standardized compliance reference text fragments at multiple levels: paragraph, sentence, and legal component. Semantic similarity and structural consistency indicators between each level are extracted. The text to be reviewed is then structured and parsed at the paragraph, sentence, and clause levels, and a corresponding numbered index is established for each level node. This structured text is then aligned with the standardized compliance reference text fragments generated in the previous steps. At the paragraph level, semantic similarity between paragraphs is calculated using a semantic vector model. At the sentence level, the consistency of sentence structure and logical order is analyzed using a syntactic structure comparison algorithm. At the legal component level, the consistency of legal semantic structure is determined by comparing information such as subject elements, behavioral components, and contextual constraints. The matching results of the three levels are uniformly recorded, and indicators such as paragraph semantic similarity, sentence structural consistency, and legal component correspondence are extracted respectively, thus forming a multi-level indicator set reflecting the overall semantic and legal structural matching degree of the text. Based on the correspondence between subjects, the consistency of behavioral descriptions, and the applicable conditions of the context, the subject correspondence degree, behavioral component matching degree, and context fit degree indicators are calculated, and weights are assigned to each indicator. A detailed analysis of the matching relationship at the legal component layer is conducted. Subject correspondence degree is calculated by identifying whether the subject entity in the text under review is consistent with the subject category in the reference text. For example, it is determined whether the subject entity, individual entity, or regulatory agency entity is consistent. Simultaneously, behavioral component matching degree is calculated by comparing information such as behavioral verbs, behavioral objects, and the logical order of behaviors to reflect the degree of consistency between the behavioral description in the text under review and the compliance reference text. Context fit degree is calculated by performing semantic similarity analysis on contextual information such as time conditions, policy background, and applicable scenarios. After calculation, weights are assigned to each indicator according to preset weight rules. For example, the subject correspondence degree weight is set to 0.3, the behavioral component matching degree weight is set to 0.4, and the context fit degree weight is set to 0.3, thus establishing a unified indicator evaluation system for comprehensive scoring calculation. By combining the legal provisions cited in the text, the matching results are checked for compliance with the legal provisions' boundaries to determine whether the behavioral descriptions exceed or deviate from the scope of application of the regulations. By identifying the names, clause numbers, or policy document citations of the legal provisions involved in the text under review, and matching them with the scope of application information in the legal provisions database, the subject categories, behavioral types, and situational conditions to which the provisions apply are analyzed. The scope of application is compared with the behavioral descriptions in the text under review to determine whether the text extends a certain legal provision to inapplicable subjects or irrelevant behavioral situations. If the behavioral descriptions are found to be inconsistent with the scope of application of the provisions or exceed the boundaries of the provisions, the matching results are marked as deviations, and the corresponding degree of deviation is recorded to reflect the compliance risks of the text at the level of legal application. A comprehensive compliance score is calculated based on the above-mentioned indicators, and a corresponding compliance score is generated. The indicators of subject correspondence, behavioral component matching, situational fit, and legal boundary compliance are weighted and summed according to preset weights to obtain a comprehensive compliance score value. The results are classified into levels according to the score range. For example, a score above 80 is judged as a high compliance level, a score between 60 and 80 is judged as a medium compliance level, and a score below 60 is judged as having a high compliance risk. The score and the corresponding indicator details are recorded together.
[0029] S5: Based on compliance scores and applicable rules, identify potential overstepping of obligations, deviations from legal concepts, and interpretational biases. Combine the rule unit weights to dynamically determine the compliance level and review grade, and generate graded compliance review results.
[0030] The process for identifying potential overstepping of obligations, deviations in legal concepts, and interpretational biases in S5 is as follows: Threshold determination is performed on each dimension indicator in the compliance scoring results to identify nodes whose subject correspondence, behavior matching, or situation fit is lower than the preset threshold; the compliance score is read and the corresponding subject correspondence, behavior component matching, and situation fit values for each node are extracted. The compliance thresholds for each dimension indicator are preset, for example, the subject correspondence threshold is set to 0.7, the behavior component matching threshold is set to 0.75, and the situation fit threshold is set to 0.7. When a node is lower than the corresponding threshold in any dimension indicator, it can be determined that the node has potential compliance deviation. These nodes that are lower than the threshold are extracted and their paragraph position, node number, and corresponding indicator value are recorded to form a set of low-match nodes, thereby achieving the initial screening of nodes that may have legal semantic deviations or compliance risks. Legal rule backtracking analysis is performed on the identified low-match nodes to determine whether there is an expansion of the scope of obligations or a conflation of legal concepts. The corresponding legal provisions and rule units can be traced based on the node's position in the dependency structure. The applicable subject scope, behavioral norms, and applicable context conditions of the provisions are retrieved from the legal provisions database. These provisions and rules are then compared with the subject, behavior, and context information described by the node. For example, if a node extends a legal obligation to an inapplicable subject, it can be determined that the scope of obligations has been expanded. If a node mixes different legal concepts or misinterprets the meaning of concepts, it can be identified as a conflation of legal concepts or a misinterpretation. Combined with semantic analysis algorithms, it can detect whether there are simplified or overgeneralized expressions of legal liability conditions in the text, thereby further confirming whether the node belongs to a potential obligation overstepping or legal interpretation deviation node. By combining the applicable rules and historical review cases, deviation nodes are classified into risk levels, generating a corresponding deviation node tagging table. The risk level of deviation nodes is quantitatively scored based on factors such as the sensitivity level of the provisions recorded in the rule unit, the scope of the impact of the behavior, and the importance of the relevant regulatory areas. Simultaneously, the historical review case database is searched to analyze situations where similar statements were judged as violations or risk warnings in previous reviews. The risk level is adjusted based on the corresponding handling results in historical cases. Based on the comprehensive score, deviation nodes are classified into low-risk, medium-risk, or high-risk levels, and the node number, deviation type, applicable legal provisions, risk level, and corresponding review warning information are recorded. Finally, a deviation node tagging table is generated to provide a clear basis for risk positioning in subsequent compliance review grading or manual review processes.
[0031] The process of generating tiered compliance review results in S5 is as follows: Based on the risk level and rule unit weight of each node in the deviation node labeling table, the overall compliance risk index is calculated. The deviation node labeling table is read, and the risk level information corresponding to each node and its weight value in the rule unit are extracted. The risk level can be assigned numerical weights according to high risk, medium risk, and low risk. For example, high-risk nodes are assigned a value of 1.0, medium-risk nodes are assigned a value of 0.6, and low-risk nodes are assigned a value of 0.3. The rule unit weight is set according to the importance of the legal provisions involved in the node, the regulatory sensitive area, and the scope of social impact. For example, the rule unit weight involving financial supervision, medical advertising, or data security can be set to a higher value. All deviation nodes are traversed and calculated. The risk level value of each node is multiplied by the corresponding rule unit weight. The results of all nodes are accumulated and normalized to obtain a comprehensive risk index that reflects the overall legal compliance risk level. This index can quantify the potential risk level of the text under review at the overall legal application level. Based on the compliance risk index and preset grading standards, the content to be reviewed is graded and judged. Multiple risk level ranges are set according to actual regulatory needs or review rules. For example, when the comprehensive risk index is below 0.3, it is judged as a low-risk compliance level, indicating that the text basically complies with the relevant legal rules. When the risk index is between 0.3 and 0.6, it is judged as a medium-risk level, indicating that there are certain legal semantic deviations or imprecise expressions in the text, which need to be appropriately modified. When the risk index is above 0.6, it is judged as a high-risk level, indicating that there are obvious deviations in the application of law or potential violations in the text. The content to be reviewed is automatically graded according to the above ranges, thus forming a preliminary compliance level judgment result. Based on the risk level corresponding to each node, risk markers are generated for each paragraph and node of the text, and a review report containing text location, risk type, and suggested modification direction is output. The corresponding paragraph and sentence positions are located by mapping the node number to the original text structure, and nodes with risks are marked in the text, for example, by color marking or numbering to indicate the problem location. At the same time, the system generates corresponding risk descriptions based on the deviation type of the node, such as marking risk categories such as "expansion of the scope of obligations", "confusion of legal concepts" or "inconsistency in application of the situation", and generates suggested modification directions by combining the corresponding legal provisions and case logic, such as suggesting supplementing the applicable conditions, clarifying the main responsibility, or adjusting the scope of legal provisions cited. The above information is integrated into a structured review report, which includes text location, risk type, corresponding legal provisions, and suggested modification schemes, thereby providing specific guidance for compliance revisions before content is published. The review results from each node are aggregated to generate the final tiered compliance review result. The review results from all nodes are then analyzed and summarized, unifying the risk level, risk type, and corresponding legal provisions for each node. Combined with the previously calculated overall compliance risk index and tiered judgment results, a final compliance review conclusion is formed. This conclusion is then output in structured data format, including information such as the overall compliance level, the number of risk nodes, the distribution of each risk type, and recommended key rectification areas. A complete review record is also generated, forming a tiered compliance review result. This not only comprehensively reflects the overall compliance status of the text but also clearly identifies specific risky paragraphs and corresponding legal issues, thereby achieving a systematic compliance review and risk warning before media content is published.
[0032] Example 2: Figure 2 As shown, a media content compliance review system based on legal rules includes: Element Analysis Module: This module breaks down the media content to be reviewed into nodes and marks the subject, behavior, and contextual elements to generate a set of legal elements. Dependency building module: Performs cross-paragraph matching on the set of legal elements and establishes a content compliance dependency structure by combining case law and the relationship between provisions; Rule completion module: Based on the dependency structure, it completes the rules for missing or incomplete legal components and generates standardized compliance reference text fragments; The compliance assessment module performs hierarchical matching between the text to be reviewed and the reference text, and calculates various compliance indicators to form a compliance score. Risk assessment module: Identify deviation points based on compliance scores and determine the corresponding compliance level and review classification results.
[0033] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0034] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A media content compliance review method based on legal rules, characterized in that, Includes the following steps: The content of the media to be censored is broken down into nodes at the paragraph and syntactic levels. Subjects, behaviors and contextual elements are marked with uncertain areas of context, and a set of legal elements is formed by combining the scope of application of the regulations and the degree of sensitivity. Cross-paragraph dependency checks and multi-level matching are performed on the set of legal elements. By combining the nested structure of case precedents, the relationship between citations of articles and the logical relationship of context, a dependency structure for content compliance behavior is established. Based on the dependency structure, constraints are filled in for missing, ambiguous or logically incomplete legal components in the text. Standardized compliance reference text fragments are generated through rule unit consistency verification, paragraph logical connectivity analysis and boundary consistency detection. The text to be reviewed is matched and weighted with standardized compliance reference text fragments in a hierarchical manner to calculate the subject correspondence, behavioral component matching degree, applicable situation fit and legal provision boundary compliance, and form a compliance score; Based on compliance scores and applicable rules, potential overstepping of obligations, deviations from legal concepts, and interpretational biases are identified. The compliance level and review classification are dynamically determined by combining the weights of rule units, generating a graded compliance review result.
2. The media content compliance review method based on legal rules according to claim 1, characterized in that, The process of forming a set of legal elements is as follows: For censored media content, a multi-level text node structure is established according to paragraphs, sentences, and clauses. Extract the main object, behavior, and contextual constraints for each node, and mark the syntactic dependencies and semantic relationships between nodes; Based on the scope of application information in the current legal provisions database, a regulatory adaptation analysis is conducted on the subject categories, behavior types, and dissemination contexts involved in the nodes. At the same time, nodes are classified into risk levels based on preset sensitivity indicators, forming a node attribute labeling table; The node attribute tagging table is integrated with the syntactic dependency relationship to identify uncertain regions in the text that have semantic jumps, ambiguous referents or incomplete contexts, and a node feature matrix is constructed based on the semantic correlation between nodes. The legal attributes of each node are aggregated based on the node feature matrix to generate a set of legal elements containing subject elements, behavioral components, contextual limitations, and risk level information.
3. The media content compliance review method based on legal rules according to claim 2, characterized in that, The process of performing cross-paragraph dependency checks and multi-level matching on a set of legal elements is as follows: The node feature matrix in the legal element set is indexed and sorted according to paragraph order, and nodes with subject-related, behavioral causal, or situational continuity relationships in different paragraphs are initially screened. Based on the indicators of subject consistency, behavioral logic relevance, and contextual continuity, multi-level matching calculations are performed on the selected nodes to identify legal components in the text that have cross-paragraph dependencies. By evaluating the logical consistency of the matching results, the main relationships, behavioral causal relationships, and contextual continuity relationships between different paragraph nodes are identified. Based on the identified relationships, establish node connection edges to construct a cross-paragraph dependency graph.
4. The media content compliance review method based on legal rules according to claim 3, characterized in that, The process of establishing a dependency structure for content compliance behavior is as follows: Legal provisions are mapped to each node in the cross-segment dependency graph. The subject type, behavior type and situation category involved in the node are matched with the legal provisions database to determine the set of applicable legal provisions. By combining the judgment logic in the historical case database, cases with similar behavioral patterns or legal application scenarios are retrieved, and the nested structure and judgment conditions of the cases are extracted. By jointly mapping the relationship between provisions, nested case structures, and logical relationships between nodes, a multi-dimensional dependency network is constructed that describes the relationship between subject behavior and legal constraints. By analyzing the node connection strength and path structure in the dependency network, key legal constraint nodes and core logical paths are identified, forming a dependency structure for content compliance behavior.
5. The media content compliance review method based on legal rules according to claim 4, characterized in that, The process of filling in the constraints for missing, ambiguous, or logically incomplete legal components in the text is as follows: Identify nodes in the dependency structure that do not form a complete logical link, including those with missing behavioral premises and legal components with unclear definitions of subject responsibility; Based on the key constraint nodes and applicable conditions of the identified nodes, supplementary constraints are extracted from the legal provisions library and case law database, and a candidate list of legal components for completion is generated. The logical consistency of the supplementary components in the candidate legal component supplementation list is screened to determine the consistency between the supplementary components and the original nodes in terms of subject relationship, behavioral logic and applicable scope of the situation; The selected completion components are embedded into the dependency structure, and the legal attributes and logical relationships of the relevant nodes are updated to generate an updated legal component structure table.
6. The media content compliance review method based on legal rules according to claim 5, characterized in that, The process of generating standardized compliance reference text fragments is as follows: Perform a rule unit consistency check on the updated legal component structure table to check whether each component meets the applicable conditions of the corresponding legal provisions and the case interpretation rules; Based on the dependency structure, a connectivity analysis is performed on the logical connection relationships between the nodes of each paragraph to determine whether the logical order between the main behavior and the contextual constraints is complete. By using boundary consistency detection, adjustments are made to the semantic connections between paragraphs, the boundaries of the scope of legal application, and the boundaries of behavioral descriptions; Based on the verification and adjustment results, standardized compliance reference text fragments covering subjects, behaviors, contexts, and legal constraints are generated.
7. The media content compliance review method based on legal rules according to claim 6, characterized in that, The process of generating compliance scores is as follows: The text to be reviewed is mapped and matched with standardized compliance reference text fragments at multiple levels, including paragraph, sentence, and legal component levels, and the semantic similarity and structural consistency indicators between each level are extracted. Based on the correspondence between subjects, the consistency of behavioral descriptions, and the applicable conditions of the context, the subject correspondence degree, behavioral component matching degree, and context fit degree indicators are calculated, and the weights of each indicator are assigned. By combining the legal provisions cited in the text, the matching results are checked for compliance with the legal provisions to determine whether the description of behavior exceeds or deviates from the scope of application of the regulations. A comprehensive compliance score is calculated based on the above-mentioned indicators, and a corresponding compliance score is generated.
8. A media content compliance review method based on legal rules according to claim 7, characterized in that, The process of identifying potential overstepping of obligations, deviations from legal concepts, and interpretational biases is as follows: Threshold determination is performed on each dimension indicator in the compliance score to identify nodes where the subject correspondence, behavior matching, or situation fit is lower than the preset threshold. Perform legal rule back-analysis on the identified low-matching nodes to determine whether there is an expansion of the scope of obligations or a conflation of legal concepts; Based on the applicable conditions of the rules and historical review cases, the risk levels of deviation nodes are classified, and a corresponding deviation node marking table is generated.
9. A media content compliance review method based on legal rules as described in claim 8, characterized in that, The process of generating tiered compliance review results is as follows: The overall compliance risk index is calculated based on the risk level and rule unit weight of each node in the deviation node labeling table. Based on the compliance risk index and preset grading standards, the content to be reviewed is graded and judged. Based on the risk level corresponding to each node, risk markers are generated for each paragraph and node of the text, and a review report containing the text location, risk type, and suggested modification direction is output. The review results from each node are aggregated to generate the final tiered compliance review results.
10. A media content compliance review system based on legal rules, applied to the media content compliance review method based on legal rules as described in any one of claims 1-9, characterized in that, include: Element Analysis Module: This module breaks down the media content to be reviewed into nodes and marks the subject, behavior, and contextual elements to generate a set of legal elements. Dependency building module: Performs cross-paragraph matching on the set of legal elements and establishes a content compliance dependency structure by combining case law and the relationship between provisions; Rule completion module: Based on the dependency structure, it completes the rules for missing or incomplete legal components and generates standardized compliance reference text fragments; The compliance assessment module performs hierarchical matching between the text to be reviewed and the reference text, and calculates various compliance indicators to form a compliance score. Risk assessment module: Identify deviation points based on compliance scores and determine the corresponding compliance level and review classification results.