Multi-modal and dual-knowledge base enhanced privacy evidence compliance determination method and system

By employing multimodal visual semantic parsing and dual-knowledge-base enhancement methods, the problems of information loss and inconsistent judgment results in the compliance determination of on-site evidence in confidential assessments have been solved, achieving automated processing and efficient compliance determination.

CN122247601APending Publication Date: 2026-06-19北京市产品质量监督检验研究院

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
北京市产品质量监督检验研究院
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies suffer from problems such as semantic gaps between text and images, strong subjectivity in judgment standards, and a lack of effective mapping of expert experience in determining the compliance of on-site evidence in confidential assessments. This results in incomplete information extraction, inconsistent judgment results, and low efficiency.

Method used

A multimodal and dual-knowledge-base enhancement method is adopted. The structured information in the image is extracted through multimodal visual semantic parsing, and compliance reasoning is performed by combining dual-knowledge-base retrieval (industry standard and excellent case knowledge base) to generate the judgment result.

Benefits of technology

It has achieved automated processing from unstructured images to structured compliance conclusions, solved the problem of information loss, improved the objectivity and consistency of judgment results, shortened the verification time, and established the correspondence between evidence and conclusions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247601A_ABST
    Figure CN122247601A_ABST
Patent Text Reader

Abstract

This invention relates to the fields of network security assessment and artificial intelligence application technology, and particularly to a method and system for compliance judgment of cryptographic security assessment evidence enhanced by multimodal and dual-knowledge-base approaches. The technical solution includes multimodal visual semantic parsing, dual-path knowledge retrieval, expert thought chain reasoning, and structured result generation. It acquires and parses multimodal evidence, obtaining original unstructured image evidence from the cryptographic application security assessment site. The image evidence is then subjected to multimodal parsing to extract structured information from the image. Dual-knowledge-base enhanced retrieval is performed, using the extracted structured information and / or assessment indicators as the retrieval basis. This invention solves the problem of information loss in traditional OCR by extracting structured information from images through multimodal parsing; it provides judgment criteria and references through dual-knowledge-base retrieval, improving the objectivity and consistency of the judgment; and it achieves automated processing, shortening verification time and establishing a correspondence between evidence and conclusions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of network security assessment and artificial intelligence application technology, and in particular to a method and system for determining compliance of confidentiality evidence enhanced by multimodal and dual knowledge bases. Background Technology

[0002] Cryptographic application security assessment is an activity that evaluates the compliance, correctness, and effectiveness of cryptographic technologies, products, and services used in networks and information systems. One of the core components of a cryptographic assessment is the verification of original evidence such as on-site configuration screenshots, log records, user interface photos, and topology diagrams of the information system to determine whether it complies with relevant national standards such as GB / T39786 "Information Security Technology - Basic Requirements for Cryptographic Applications in Information Systems".

[0003] Currently, the compliance determination of on-site evidence in confidential assessments primarily relies on manual review. Assessment personnel need to collect various unstructured image evidence on-site and verify each piece of evidence against the standard requirements through manual observation, recording, and comparison. However, existing technical solutions have the following technical shortcomings in practical applications:

[0004] First, the semantic gap between text and images leads to incomplete extraction of evidentiary information. Current technologies typically employ Optical Character Recognition (OCR) to extract text from images. However, OCR technology can only recognize discrete characters within an image and cannot understand its layout structure and semantic logic. For example, for images containing tables, OCR extraction often loses the row and column correspondences; for images containing key-value pairs, it cannot establish semantic relationships between keys and values; and for the states of visual elements such as checkboxes and radio buttons, OCR technology is completely unable to recognize them. This semantic gap between text and images causes the extracted information to lose its original layout logic, rendering it unusable in subsequent applications.

[0005] Second, the subjective nature of the judgment standards leads to inconsistent results. Standards such as GB / T 39786 typically use general and general statements, such as "compliant cryptographic algorithms should be used" and "secure storage of identity authentication information should be ensured," lacking quantitative judgment benchmarks for specific configuration details. Different evaluators often have different understandings and judgments of the same evidence image, resulting in varying compliance conclusions. In addition, manual review is inefficient, prone to omissions and misjudgments when faced with a large amount of on-site evidence, and makes it difficult to establish traceable judgment criteria.

[0006] Third, there is a lack of effective mapping mechanisms for expert experience. In recent years, some studies have attempted to introduce general-purpose large language models to assist in compliance judgments. However, general-purpose large language models lack expertise in the field of cryptographic application security, and directly using general models for judgments can easily lead to "illusion" phenomena, i.e., generating judgments that are inconsistent with the facts or lack evidence. At the same time, existing technologies lack mechanisms to effectively transfer expert experience from historically successful cases to the judgment of new evidence, making it impossible to accumulate and reuse experience.

[0007] To address the aforementioned technical issues, this application proposes a multimodal and dual-knowledge-base-enhanced method and system for determining compliance of confidential evidence. Summary of the Invention

[0008] The purpose of this invention is to address the lack of an automated judgment scheme in the existing technology that can simultaneously solve the problems of semantic fragmentation of text and graphics, subjectivity of judgment criteria, and mapping of expert experience, and to propose a multimodal and dual-knowledge-base-enhanced method and system for judging compliance of confidential evidence.

[0009] Firstly, this application provides a multimodal and dual-knowledge-base-enhanced method for determining compliance of confidentiality evidence, including the following steps:

[0010] Acquire and parse multimodal evidence, obtain raw unstructured image evidence from the cryptographic application security assessment site, perform multimodal parsing on the image evidence, and extract structured information from the image;

[0011] The system performs enhanced retrieval using dual knowledge bases, with extracted structured information and / or evaluation indicators as the retrieval basis. It retrieves relevant standard clause information from the pre-set industry standard knowledge base and similar historical case information from the pre-set excellent case knowledge base.

[0012] Compliance reasoning is performed based on the structured information, the standard clause information, and the historical case information. An artificial intelligence model is used to reason and generate a compliance judgment result for the image evidence.

[0013] Output the judgment result, specifically the compliance judgment result.

[0014] Optionally, the step of performing multimodal analysis on the image evidence to extract structured information from the image specifically includes:

[0015] The image evidence is analyzed to identify at least one semantic region in the image;

[0016] Based on the layout features of visual elements within the semantic region, the logical structure of the semantic region is reconstructed to generate structured evidence text.

[0017] Optionally, the semantic region includes at least one of a table region, a key-value pair region, or a text paragraph region; the logical structure of reconstructing the semantic region includes: reconstructing the row and column correspondence for the table region, and establishing the semantic association between keys and values ​​for the key-value pair region.

[0018] Optionally, similar historical case information can be obtained from a pre-defined knowledge base of excellent case precedents, specifically including:

[0019] The structured information or the visual features of the image evidence are converted into retrieval vectors;

[0020] Retrieve one or more historical cases from the excellent case knowledge base that meet the preset conditions for semantic similarity with the retrieval vector. The historical cases include historical evidence information and their corresponding expert compliance judgment results.

[0021] Optionally, reasoning can be performed using artificial intelligence models, including:

[0022] The structured information, the standard clause information, and the historical case information are input into the large language model;

[0023] The large language model is triggered to perform a compliance comparison of the structured information based on the standard clause information, and generates a compliance judgment result by referring to the judgment logic of the historical case information.

[0024] Optionally, triggering the large language model to perform a compliance comparison of the structured information based on the standard clause information specifically includes: guiding the large language model to perform step-by-step reasoning according to a preset thought chain logic.

[0025] Optionally, the compliance determination result includes at least: a compliance conclusion, a description of the evidence supporting the conclusion, and key parameters extracted from the image evidence.

[0026] Secondly, this application provides a multimodal and dual-knowledge-base enhanced confidentiality evidence compliance determination system for implementing the multimodal and dual-knowledge-base enhanced confidentiality evidence compliance determination method as described in the first aspect, including:

[0027] The multimodal parsing module is used to acquire raw unstructured image evidence from the site of cryptographic application security assessment, and to perform multimodal parsing on the image evidence to extract structured information from the image;

[0028] The dual-knowledge-base retrieval module uses extracted structured information and / or evaluation indicators as the retrieval basis to obtain relevant standard clause information from the preset industry standard knowledge base and similar historical case information from the preset excellent case knowledge base.

[0029] The reasoning and judgment module, based on the structured information, the standard clause information, and the historical case information, uses an artificial intelligence model to reason and generate a compliance judgment result for the image evidence;

[0030] The result output module is used to output the compliance determination result.

[0031] Optionally, the multimodal parsing module further includes:

[0032] The image evidence is analyzed to identify at least one semantic region in the image;

[0033] Based on the layout features of visual elements within the semantic region, the logical structure of the semantic region is reconstructed to generate structured evidence text.

[0034] Optionally, the dual knowledge base retrieval module further includes:

[0035] The structured information or the visual features of the image evidence are converted into retrieval vectors;

[0036] Retrieve one or more historical cases from the excellent case knowledge base that meet the preset conditions for semantic similarity with the retrieval vector. The historical cases include historical evidence information and their corresponding expert compliance judgment results.

[0037] Compared with the prior art, this application includes at least one of the following beneficial technical effects:

[0038] This invention utilizes multimodal visual semantic parsing to extract structured information from unstructured image evidence. The method identifies semantic regions such as tables and key-value pairs in images and reconstructs the row and column relationships and key-value correspondences of tables based on geometric coordinates, thus solving the problem of information loss when extracting tables and key-value pairs using traditional OCR techniques.

[0039] This invention employs a dual-knowledge-base retrieval enhancement mechanism. It retrieves standard clauses from the industry standard knowledge base as the basis for judgment and retrieves similar historical precedents from the excellent case knowledge base as reference paradigms. This guides the large language model to perform compliance reasoning, improving the objectivity and consistency of the judgment results and reducing the judgment discrepancies caused by the abstraction of standard clauses.

[0040] This invention automates the process from inputting image evidence to outputting compliance results, eliminating the need for manual transcription of image parameters and comparison with standards one by one. This shortens the verification time for individual indicators and automatically establishes the correspondence between evidence and conclusions, facilitating subsequent auditing and review.

[0041] In summary, this invention solves the problem of information loss in traditional OCR by extracting structured information from images through multimodal parsing; it improves the objectivity and consistency of judgment by providing judgment criteria and references through dual knowledge base retrieval; and it achieves automated processing, shortens the verification time, and establishes a correspondence between evidence and conclusions. Attached Figure Description

[0042] Figure 1 A flowchart for a compliance determination method for confidential evidence enhanced with multimodal and dual knowledge bases. Detailed Implementation

[0043] The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that, unless otherwise specified, the following embodiments and features described therein can be combined with each other.

[0044] Example: Figure 1 As shown, the multimodal and dual-knowledge-base enhanced confidentiality evidence compliance determination method of the present invention includes the following steps: S1 multimodal visual semantic parsing, S2 dual-path knowledge retrieval, S3 expert thinking chain reasoning, and S4 structured result generation. The method is described in detail below.

[0045] Step S1: Multimodal visual semantic parsing, used to convert the raw, unstructured image evidence collected at the confidential assessment site into a structured text description with layout information. Specifically, the system receives raw evidence images uploaded by users, such as screenshots of communication channel server certificates, photos of the IP address configuration interfaces of both communicating parties, and on-site photos showing the port service activation status.

[0046] A multimodal model-based analysis algorithm is used to perform hierarchical processing on the image:

[0047] Region detection: Using a pre-trained multimodal layout analysis model to identify different semantic regions in an image, including but not limited to table regions, text paragraph regions, key-value (KV) regions, etc.

[0048] Structural Reconstruction: Targeted structural reconstruction is performed for different identified region types. For table image regions, based on the two-dimensional geometric coordinates of the text boxes within the region, clustering and alignment algorithms are used to reconstruct the row and column correspondence of the table, solving the problem of table data disorder caused by text misalignment in traditional OCR technology. For key-value pair regions, semantic associations between keys and values ​​are established by calculating the horizontal or vertical projection distance and relative position relationship between key elements and value elements, thereby accurately extracting structured data in the form of "parameter name: parameter value".

[0049] After the above processing, the system outputs a structured text description containing the complete layout logic, such as a JSON-formatted data object, providing high-quality input for subsequent steps.

[0050] Step S2: Dual-RAG core mechanism provides dual knowledge support for compliance determination of large models, including "legal basis" and "case law reference". Specifically, a dual-path retrieval mechanism is employed:

[0051] The first approach is industry standard retrieval. Based on the metadata of the current assessment object (such as "Assessment Indicator: Identity Authentication", "Assessment Level: Level 3", etc.), it searches a pre-defined industry standard vector library to obtain the corresponding standard clause text. For example, it retrieves specific clause requirements related to identity authentication from the national standard GB / T 39786. This vector library pre-stores the clause texts of various standards and their vectorized representations, supporting semantic-level matching.

[0052] The second approach involves retrieving excellent precedents. The structured text descriptions or visual features of the original image evidence extracted in step S1 (generated using a visual feature extractor) are used as query vectors. A similarity search is performed in a pre-built library of excellent example vectors to recall 1-3 semantically similar historical excellent precedents. Each excellent precedent includes a "historical image feature description" and its corresponding "correct judgment result given by the expert." This retrieval process utilizes a few-shot learning (In-Context Learning) mechanism, allowing the large model to learn expert experience regarding "what configuration is compliant," thereby improving the accuracy and consistency of the judgments.

[0053] Step S3: Expert thought chain reasoning integrates the information obtained in steps S1 and S2 to construct a prompt containing the following elements, which is then input into the large language model:

[0054] Current evidence: The structured text description output from step S1;

[0055] Judgment basis: The standard clause text retrieved in step S2;

[0056] Reference paradigm: Similar historical precedents retrieved in step S2 (including descriptions of historical evidence and expert conclusions).

[0057] Meanwhile, pre-defined Chain of Thought (CoT) guidance instructions are embedded in the prompts, driving the large language model to simulate the expert's thought process through step-by-step reasoning: first, the key parameters in the current evidence are analyzed; then, they are compared item by item with the normative requirements of standard clauses; next, the judgment logic of historical precedents is analyzed; and finally, a preliminary judgment on compliance is derived. This reasoning method effectively suppresses the "illusion" phenomenon that general large models are prone to produce in professional fields.

[0058] Step S4: Structured Result Generation. The large language model outputs a standardized assessment result object based on the inference results. This object must contain at least the following fields:

[0059] Compliance conclusion: Determination label, such as "compliant", "non-compliant", or "partially compliant";

[0060] Evidence description: Automatically generated natural language description used to explain the basis for the conclusion, such as "Based on image XXX, the device is configured with the SM4 algorithm, which meets confidentiality requirements";

[0061] Extracted values: Key parameters extracted from the image (such as algorithm name, certificate number, validity period, etc.) are preserved for subsequent auditing and traceability.

[0062] The above method can be implemented by a computer program and deployed in a confidential assessment platform to form a corresponding automated judgment system. This system includes a multimodal parsing module, a dual-knowledge base retrieval module, a reasoning and judgment module, and a result generation module, which respectively perform the functions of steps S1 to S4 described above. Specific Implementation

[0063] The method of the present invention will be described in detail below using a specific confidential evaluation scenario as an example.

[0064] Evaluation object: Login module of application system A (hereinafter referred to as "evaluation object A").

[0065] Evaluation indicator: Identity authentication (corresponding to the third level requirement of GB / T 39786: "Users shall be authenticated by a combination of two or more authentication technologies such as passwords, cryptographic technology, and biometric technology, and one of the authentication technologies shall be cryptographic technology").

[0066] Input evidence: A scanned image uploaded by the user, containing the "Commercial Cryptographic Product Certification Certificate" for the smart password key (USB Key) used by test subject A.

[0067] Step S1 execution: The system receives the "Commercial Cryptography Product Certification Certificate" image uploaded by the user. The multimodal visual analysis module is invoked to first perform region detection on the certificate layout, identifying the "Certificate Title Area," "Detailed Information Key-Value Pair Area," and "Official Seal Area." Within the "Detailed Information Key-Value Pair Area," geometric coordinate alignment technology is used to accurately extract the following structured data:

[0068] Product Name: Smart Password Key

[0069] Model: SJK1234

[0070] Supported algorithms: SM2, SM3, SM4

[0071] Certificate Number: SXH-202X-XXXX

[0072] Valid until December 31, 2028

[0073] The system outputs structured JSON data:

[0074] json

[0075] {

[0076] "Type": "Certificate"

[0077] "Algorithm": ["SM2", "SM3", "SM4"],

[0078] "Status": "Active"

[0079] }

[0080] Step S2 is executed as follows:

[0081] First path (standard retrieval): Based on the evaluation unit "identity authentication", the system retrieves the GB / T39786 clause in the industry standard database: "Identity authentication information shall be protected for confidentiality and integrity using compliant cryptographic algorithms during transmission and storage".

[0082] The second approach (sample retrieval): The system uses "commercial cryptographic product certificate" and "identity authentication" as search criteria and retrieves a similar historical case from the excellent sample library. This case includes the structured features of the historical certificate image and the "compliant" conclusion given by experts.

[0083] Step S3 is executed:

[0084] The system constructs the above information into prompts and inputs them into a large language model. The model then executes expert reasoning logic.

[0085] Compliance comparison: According to the requirements of the regulations, SM2 and SM3 are national commercial cryptographic standard algorithms and meet the definition of "compliant cryptographic algorithms"; and the certificates are within the validity period.

[0086] Case reference: Refer to the retrieved historical examples to learn the output style and judgment results corresponding to similar inputs.

[0087] Overall assessment: This evidence is sufficient to prove that the evaluated subject A used compliant cryptographic technology in the identity verification process.

[0088] Step S4 is executed:

[0089] The system outputs the final evaluation results for:

[0090] Compliance conclusion: In compliance

[0091] On-site record: Upon verification, the tested object A uses a smart cryptographic key for identity authentication. This device holds a valid commercial cryptographic product certification certificate (No.: SXH-202X-XXXX), supports compliant algorithms such as SM2 / SM3 / SM4, and meets the requirements for cryptographic technology in identity authentication.

[0092] Extracted values: {"Algorithm": ["SM2", "SM3", "SM4"], "Certificate Status": "Valid"}.

[0093] It should be noted that the above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, several improvements and modifications can be made without departing from the spirit and principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention. For example, the excellent case retrieval in step S2 can be based on direct vector retrieval based on image features, and is not limited to text features; the large language model used in step S3 can also be replaced with other artificial intelligence models with reasoning capabilities. These variations all fall within the scope of the present invention.

[0094] This invention overcomes the semantic loss bottleneck of traditional OCR technology, achieving high-fidelity structured reconstruction of complex layout evidence. Existing general-purpose OCR technologies often only extract discrete text information, easily losing table row and column alignment relationships and key-value pair correspondences, rendering the extracted data unusable. This invention, through a semantic extraction mechanism based on multimodal visual layout analysis, utilizes geometric coordinates to construct the topological relationships of visual elements, accurately identifying different semantic regions in images such as tables, key-value pairs, and text paragraphs, and specifically reconstructing the row and column logic of tables and the mapping relationships of key-value pairs. This technical solution effectively solves the problem of compliance misjudgment caused by layout errors, providing a high-quality data foundation for subsequent accurate reasoning.

[0095] This invention addresses the challenges of subjective compliance judgments and inconsistencies in standard implementation, significantly improving the standardization of assessment conclusions. To address the issues of abstract standards and varying judgment criteria among different personnel, this invention innovatively introduces a "dual-path knowledge retrieval enhancement" mechanism. On one hand, it retrieves industry standard knowledge bases to provide authoritative standard basis for judgments, ensuring that conclusions are grounded in evidence. On the other hand, it retrieves excellent case law knowledge bases and utilizes a few-shot learning mechanism to mimic the historical judgment logic of experts, enabling the model to learn expert experience on "what configuration constitutes compliance." This dual-knowledge-base collaborative retrieval enhancement method achieves expert-level objectivity, accuracy, and consistency in machine-generated judgments.

[0096] Furthermore, this invention achieves end-to-end automation from unstructured visual evidence to structured compliance conclusions, significantly improving the efficiency of on-site compliance review operations. This invention constructs a fully automated reasoning loop, eliminating the need for manual transcription of image parameters or line-by-line comparison with standard clauses. The system can automatically complete the entire process from "image acquisition" to "layout analysis," then to "knowledge retrieval," and finally to "compliance conclusion generation." This not only significantly shortens the verification time for individual indicators and reduces the risk of omissions and misjudgments in manual review, but also automatically establishes a traceability relationship between evidence and conclusions, providing complete data support for subsequent audits, reviews, and quality supervision.

[0097] The above specific embodiments are merely several optional embodiments of the present invention. Based on the technical solutions of the present invention and the relevant teachings of the above embodiments, those skilled in the art can make various alternative improvements and combinations to the above specific embodiments.

Claims

1. A method for multi-modal and dual-knowledge base enhanced privacy evidence compliance adjudication, characterized in that, Includes the following steps: Acquire and parse multimodal evidence, obtain raw unstructured image evidence from the cryptographic application security assessment site, perform multimodal parsing on the image evidence, and extract structured information from the image; The system performs enhanced retrieval using dual knowledge bases, with extracted structured information and / or evaluation indicators as the retrieval basis. It retrieves relevant standard clause information from the pre-set industry standard knowledge base and similar historical case information from the pre-set excellent case knowledge base. Compliance reasoning is performed based on the structured information, the standard clause information, and the historical case information. An artificial intelligence model is used to reason and generate a compliance judgment result for the image evidence. Output the judgment result, specifically the compliance judgment result.

2. The multi-modal and dual-knowledge base augmented classified evidence compliance adjudication method of claim 1, wherein, The step of performing multimodal analysis on the image evidence to extract structured information from the image specifically includes: The image evidence is analyzed to identify at least one semantic region in the image; Based on the layout features of visual elements within the semantic region, the logical structure of the semantic region is reconstructed to generate structured evidence text.

3. The method for compliance determination of confidential evidence enhanced by multimodal and dual-knowledge-base approaches according to claim 2, characterized in that, The semantic region includes at least one of a table region, a key-value pair region, or a text paragraph region; the logical structure of reconstructing the semantic region includes: reconstructing the row and column correspondence for the table region, and establishing the semantic association between keys and values ​​for the key-value pair region.

4. The method for compliance determination of confidentiality evidence enhanced by multimodal and dual-knowledge-base approaches according to claim 1, characterized in that, Obtain similar historical case information from a pre-defined knowledge base of excellent case precedents, specifically including: The structured information or the visual features of the image evidence are converted into retrieval vectors; Retrieve one or more historical cases from the excellent case knowledge base that meet the preset conditions for semantic similarity with the retrieval vector. The historical cases include historical evidence information and their corresponding expert compliance judgment results.

5. The method for compliance determination of confidentiality evidence enhanced by multimodal and dual-knowledge-base approaches according to claim 1, characterized in that, Reasoning through artificial intelligence models includes: The structured information, the standard clause information, and the historical case information are input into the large language model; The large language model is triggered to perform a compliance comparison of the structured information based on the standard clause information, and generates a compliance judgment result by referring to the judgment logic of the historical case information.

6. The method for compliance determination of confidentiality evidence enhanced by multimodal and dual-knowledge-base approaches according to claim 5, characterized in that, Triggering the large language model to perform a compliance comparison of the structured information based on the standard clause information specifically includes: guiding the large language model to perform step-by-step reasoning according to the preset thought chain logic.

7. The method for compliance determination of confidentiality evidence enhanced by multimodal and dual-knowledge-base approaches according to claim 1, characterized in that, The compliance determination result includes at least: a compliance conclusion, a description of the evidence supporting the conclusion, and key parameters extracted from the image evidence.

8. A multimodal and dual-knowledge-base enhanced confidentiality evidence compliance determination system, used to implement the multimodal and dual-knowledge-base enhanced confidentiality evidence compliance determination method as described in any one of claims 1-7, characterized in that, include: The multimodal parsing module is used to acquire raw unstructured image evidence from the site of cryptographic application security assessment, and to perform multimodal parsing on the image evidence to extract structured information from the image; The dual-knowledge-base retrieval module uses extracted structured information and / or evaluation indicators as the retrieval basis to obtain relevant standard clause information from the preset industry standard knowledge base and similar historical case information from the preset excellent case knowledge base. The reasoning and judgment module, based on the structured information, the standard clause information, and the historical case information, uses an artificial intelligence model to reason and generate a compliance judgment result for the image evidence; The result output module is used to output the compliance determination result.

9. The multimodal and dual-knowledge-base enhanced confidentiality evidence compliance determination system according to claim 8, characterized in that, The multimodal parsing module further includes: The image evidence is analyzed to identify at least one semantic region in the image; Based on the layout features of visual elements within the semantic region, the logical structure of the semantic region is reconstructed to generate structured evidence text.

10. The multimodal and dual-knowledge-base enhanced confidentiality evidence compliance determination system according to claim 8, characterized in that, The dual knowledge base retrieval module further includes: The structured information or the visual features of the image evidence are converted into retrieval vectors; Retrieve one or more historical cases from the excellent case knowledge base that meet the preset conditions for semantic similarity with the retrieval vector. The historical cases include historical evidence information and their corresponding expert compliance judgment results.