Data processing and explainability auditing methods, devices, media, and computer program products
By processing multi-source data and analyzing related knowledge graphs, the problem of AI review systems being unable to provide interpretable reports in the scenario of issuing science and technology innovation vouchers has been solved. This has enabled multi-source collaboration and white-box decision-making, enhancing the system's engineering feasibility and regulatory adaptability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI R&D PUBLIC SERVICE PLATFORM MANAGEMENT CENT
- Filing Date
- 2026-03-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing AI auditing systems based on deep learning cannot provide clear and traceable auditing criteria in high-value business scenarios such as the issuance of science and technology innovation vouchers, cannot deeply penetrate and identify complex related risks, and cannot transform the black-box decisions of the model into intuitive and interpretable reports.
By acquiring multi-source business data, including images, text, and structured data, an associated knowledge graph is constructed. A graph query algorithm is used to extract associated risk scores, and combined with scores for rule violations and abnormal behavior, a comprehensive risk value is calculated to generate an interpretable report.
It enables multi-source collaboration and white-box decision-making in scenarios such as the issuance of science and technology innovation vouchers, enhances the system's engineering feasibility, meets the requirements of strict supervision and manual review, and provides a traceable and explainable risk details display.
Smart Images

Figure CN122243626A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and in particular to data processing and interpretability auditing methods, devices, media, and computer program products. Background Technology
[0002] In high-value government and financial business scenarios such as the issuance of science and technology innovation vouchers and loan approvals, the introduction of intelligent review systems has greatly improved processing efficiency. However, because these businesses involve large amounts of fund disbursements and strict compliance audits, they place extremely high demands on the rigor, transparency, and risk control capabilities of review decisions.
[0003] Currently, most mainstream AI auditing systems based on deep learning are "black box" models, capable only of outputting a final "pass" or "reject" conclusion, without providing clear, readable judgment criteria for business personnel. When facing regulatory audits or manual reviews, the lack of a "traceable and explainable" chain of evidence makes the practical engineering implementation of intelligent systems extremely difficult. Furthermore, existing auditing methods often focus on surface-level data verification of a single enterprise, neglecting the complex network transaction risks hidden behind the enterprise, such as equity connections, transfer of benefits, and duplicate applications. Therefore, there is an urgent need for a data processing and auditing method that can deeply penetrate and identify complex interconnected risks, and transform the black-box decisions of the model into intuitive, traceable, and explainable reports. Summary of the Invention
[0004] The purpose of this invention is to provide data processing and interpretability auditing methods, devices, media, and computer program products to solve the technical problems of information limitations in intelligent auditing and the inability of automatic auditing systems based on "black box" algorithms to provide auditing evidence that meets regulatory traceability requirements in scenarios such as the issuance of science and technology innovation vouchers.
[0005] The first aspect of this invention discloses a data processing and interpretability auditing method, the method comprising:
[0006] Obtain multi-source business data to be reviewed, including image data, text data, structured data, and relational data;
[0007] Rule elements are extracted from relevant policy texts to perform compliance comparisons between the text data and the structured data, and rule violation scores are obtained; an association knowledge graph is constructed based on the relationship data, and an association risk score is extracted using a graph query algorithm; the comprehensive risk value of the object to be reviewed is calculated by combining the rule violation score and the association risk score.
[0008] According to the preset risk weighting formula, the pre-calculated abnormal behavior score, the rule violation score, and the associated risk score are weighted and summed to obtain a comprehensive risk value;
[0009] The multi-source business data, the abnormal behavior score, the rule violation score, and the associated risk score are input into a pre-trained audit module to obtain the decision confidence level.
[0010] Based on the combined matching rules of the decision confidence level and the comprehensive risk value, the classification results for multiple preset audit conclusion categories are determined;
[0011] The contribution of each input data feature to the classification result is calculated using a post-hoc interpretation algorithm, generating an interpretable report that includes feature importance visualization and local attribution analysis.
[0012] The method according to the first aspect of the present invention further includes:
[0013] The image data is sequentially subjected to geometric correction, hybrid filtering noise reduction, and local quality enhancement, and key regions are cropped using a target detection algorithm to obtain standard image data;
[0014] By using a natural language processing component that incorporates domain rules, entity extraction and two-layer filtering are performed on the text data to obtain standard text data;
[0015] The structured data is normalized, and the relational data is transformed into a triplet structure.
[0016] According to a method of a first aspect of the present invention, the step of sequentially performing geometric correction, hybrid filtering noise reduction, and local quality enhancement on the image data, and cropping key regions using a target detection algorithm to obtain standard image data, includes:
[0017] By using edge line detection and perspective transformation matrix, the target text region in the image data is corrected to a horizontal state;
[0018] By combining Gaussian filtering and median filtering algorithms, background noise interference in the image data is eliminated.
[0019] The local brightness of the image data is optimized using a contrast-limited adaptive histogram equalization algorithm.
[0020] The target detection algorithm is used to locate key regions in the image data for cropping. When the area ratio of the key region is less than a preset area threshold, the original image is preserved and an edge cropping warning sign is generated.
[0021] According to a method of the first aspect of the present invention, the step of using a natural language processing component that incorporates domain rules to perform entity extraction and two-layer filtering on the text data to obtain standard text data includes:
[0022] The preset domain-specific dictionary is imported into the natural language processing component for word segmentation, and context matching rules are configured to identify numerical entities. The numerical entities are then converted into a unified output format.
[0023] The text data after word segmentation is filtered for redundant words by using a first stop word list containing general meaningless function words and a second stop word list containing high-frequency policy terms in the field.
[0024] According to a method of the first aspect of the present invention, the step of extracting rule elements based on relevant policy texts to perform compliance comparisons on the text data and the structured data to obtain rule violation scores includes:
[0025] The relevant policy texts are parsed using a pre-trained language model to extract keywords containing upper limit requirements and prohibitive clauses, and these keywords are then transformed into structured review rules.
[0026] The standard text data and structured data of the object to be reviewed are compared with the structured review rules, and the rule violation score is output according to the degree of violation.
[0027] According to a method of a first aspect of the present invention, the step of constructing an association knowledge graph based on the relational data and extracting association risk scores using a graph query algorithm includes:
[0028] The data transformed into the triple structure is imported into a graph database to construct the associated knowledge graph containing enterprise, institution, policy and product nodes and their associated edge features;
[0029] The graph query algorithm is executed in the associated knowledge graph to identify whether the object to be reviewed has an associated transaction network or a duplicate application status.
[0030] According to a method of the first aspect of the present invention, the step of calculating the contribution of each input data feature to the classification result using a post-hoc interpretation algorithm and generating an interpretable report including feature importance visualization and local attribution analysis includes:
[0031] Using a game theory-based feature attribution algorithm, the importance ranking of global sample input features is calculated, and the positive or negative contribution of each input data feature in a single sample to be reviewed is extracted to generate a visualization chart.
[0032] Using a local perturbation explanation algorithm, perturbation samples are generated around the feature space of the individual sample to be reviewed, and a local linear model is fitted. The core basis features and secondary influence features that lead to the classification results are converted into natural language explanation text.
[0033] The interpretability report is generated based on the classification results, the visualization charts, the natural language explanation text, and the comprehensive risk value.
[0034] A second aspect of the present invention discloses an electronic device comprising a memory storing computer-executable instructions and a processor, wherein when the instructions are executed by the processor, the electronic device performs a data processing and interpretability verification method according to a first aspect of the present invention.
[0035] A third aspect of the present invention discloses a computer storage medium storing instructions that, when executed on a computer, cause the computer to perform a data processing and interpretability auditing method according to a first aspect of the present invention.
[0036] A fourth aspect of the present invention discloses a computer program product comprising computer-executable instructions that are executed by a processor to implement a data processing and interpretability auditing method according to a first aspect of the present invention.
[0037] The main differences and effects of this invention compared to existing technologies are as follows:
[0038] In this invention, multi-source business data to be reviewed is acquired, including image data, text data, structured data, and relational data. Rule elements are extracted based on relevant policy texts to perform compliance comparisons on the text and structured data, obtaining rule violation scores. A knowledge graph is constructed based on the relational data, and a graph query algorithm is used to extract relational risk scores. Combining the rule violation scores and relational risk scores, a comprehensive risk value for the object to be reviewed is calculated. According to a preset risk weighting formula, the pre-calculated abnormal behavior scores, rule violation scores, and relational risk scores are weighted and summed to obtain a comprehensive risk value. The multi-source business data, abnormal behavior scores, rule violation scores, and relational risk scores are input into a pre-trained review module to obtain decision confidence. Based on a combination matching rule of decision confidence and comprehensive risk value, classification results for multiple preset review conclusion categories are determined. A post-hoc explanation algorithm is used to calculate the contribution of each input data feature to the classification results, generating an interpretable report that includes feature importance visualization and local attribution analysis. This achieves multi-source collaboration and white-box decision-making in scenarios such as intelligent review of science and technology innovation voucher issuance. By standardizing preprocessing procedures, defining decision-making logic with clear parameters, and providing a traceable and explainable risk profile, the system's engineering feasibility has been enhanced, meeting stringent regulatory and manual review requirements. Attached Figure Description
[0039] Figure 1 A flowchart illustrating a data processing and interpretability review method according to an embodiment of this application is shown.
[0040] Figure 2 This diagram illustrates the architecture of a multi-source collaborative business data processing and review system according to an embodiment of this application.
[0041] Figure 3 This is a hardware structure block diagram of an electronic device implementing the embodiments of this application. Detailed Implementation
[0042] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.
[0043] In current government or financial review processes such as those for science and technology innovation vouchers, reliance on a single data source often results in information limitations, and automated review systems based on "black box" algorithms cannot provide review evidence that meets regulatory traceability requirements, hindering the large-scale implementation of intelligent review projects. To address these technical issues, embodiments of this application disclose a data processing and interpretability review method, which is described below in conjunction with… Figure 1 The flowchart and interpretability audit method shown are as follows: Figure 2 The architecture diagram of the multi-source collaborative business data processing and review system shown is described in detail. The methods include:
[0044] S101, Obtain multi-source business data to be reviewed. Multi-source business data includes image data, text data, structured data, and relational data.
[0045] Specifically, the system fully connects to business ports through the data acquisition and processing module 210, receiving image data such as qualification certificates, contract text data, structured data such as credit scores, and relationship data such as equity associations. Through multi-source collaborative access, it covers the complete dimensions required for review, effectively avoiding information blind spots caused by single data types.
[0046] S102, extract rule elements based on relevant policy texts to conduct compliance comparisons of text data and structured data, and obtain rule violation scores; construct a related knowledge graph based on relational data, and use graph query algorithms to extract related risk scores; combine rule violation scores and related risk scores to calculate the comprehensive risk value of the object to be reviewed.
[0047] Specifically, the compliance comparison branch module 220 and the associated knowledge graph module 230 play their respective roles, transforming unstructured policy guidance documents into computable rules and discrete relational data into a network topology graph, thereby calculating quantified rule violation scores and associated risk scores.
[0048] S103, according to the preset risk weighting formula, the pre-calculated abnormal behavior score, rule violation score and related risk score are weighted and summed to obtain the comprehensive risk value.
[0049] Specifically, based on the abnormal behavior characteristics calculated by the abnormal behavior analysis module 240, the comprehensive risk calculation module 250 (i.e., the risk warning module) integrates the above three types of risk scores using a specific weighted logic, serving as one of the core inputs for the fusion decision. Specifically, the system uses the following risk weighting formula: Risk Value = Rule Violation Score × 0.4 + Abnormal Behavior Score × 0.3 + Related Risk Score × 0.3. The calculated comprehensive risk value ranges from 0 to 100 points. The system then maps this risk value to corresponding risk level intervals: 0-30 points correspond to the low-risk interval, 31-60 points to the medium-risk interval, and 61-100 points to the high-risk interval. Through this calculation method, the system outputs a quantifiable comprehensive risk value with global reference value.
[0050] S104 inputs multi-source business data, abnormal behavior scores, rule violation scores, and associated risk scores into a pre-trained audit module to obtain decision confidence.
[0051] Specifically, the pre-trained review module 260 serves as the core intelligent judgment engine, receiving features and risk quantification indicators from all dimensions, using an internal multi-dimensional fusion network for feature representation and reasoning, and outputting decision confidence levels (such as confidence levels for approval and rejection) for different review conclusion categories.
[0052] S105, based on the combined matching rules of decision confidence and comprehensive risk value, determines the classification results for multiple preset audit conclusion categories.
[0053] Specifically, the combination matching classification module 270 combines the soft decision confidence level output by the pre-training review module 260 with the hard business comprehensive risk value output by the comprehensive risk calculation module 250, and intercepts work orders with high risk or insufficient confidence level according to the set threshold limit, thereby outputting the final definitive classification result (such as "passed" or "pending manual review").
[0054] S106 uses a post-hoc interpretation algorithm to calculate the contribution of each input data feature to the classification result, generating an interpretable report that includes feature importance visualization and local attribution analysis.
[0055] Specifically, the post-event explanation generation module 280 performs reverse attribution analysis on the review decision-making process. It extracts the specific impact of multi-source features on the decision-making process and ultimately outputs an intuitive, business-readable, and interpretable report.
[0056] The method according to the embodiments of this application realizes multi-source collaboration and white-box decision-making in scenarios such as intelligent review of science and technology innovation voucher issuance. Through standardized preprocessing, decision fusion logic with clear parameters, and "traceable and explainable" risk detail display, the system's engineering feasibility is enhanced, meeting the requirements of strict supervision and manual review.
[0057] The method according to the embodiments of this application further includes: sequentially performing geometric correction, hybrid filtering noise reduction, and local quality enhancement on the image data, and cropping key regions using an object detection algorithm to obtain standard image data; using a natural language processing component that introduces domain rules to perform entity extraction and two-layer filtering on the text data to obtain standard text data; normalizing the structured data, and converting the relational data into a triplet structure.
[0058] Specifically, the data acquisition and processing module 210 includes customized and standardized preprocessing pipelines for various data types. For structured data, the system performs Z-score normalization on continuous features such as credit scores and registered capital, and performs one-hot encoding on discrete features such as "whether there are administrative penalties" and "industry affiliation." Missing values are filled using the median (for continuous data) or mode (for discrete data). For relational data, the system organizes and transforms it into a standardized triple format of "enterprise-related party-relationship type" for direct use by subsequent modules.
[0059] This embodiment eliminates the differences in dimensions, formats, and noise interference between heterogeneous multi-source data through targeted preprocessing methods, providing a highly clean and structurally unified high-quality data foundation for complex downstream rule matching and model inference.
[0060] According to the method of this application embodiment, geometric correction, hybrid filtering noise reduction, and local quality enhancement are sequentially performed on image data, and key regions are cropped using a target detection algorithm to obtain standard image data. This includes: correcting the target text region in the image data to a horizontal state using edge line detection and perspective transformation matrix; combining Gaussian filtering and median filtering algorithms to eliminate background interference noise in the image data; optimizing the local brightness of the image data using a contrast-limited adaptive histogram equalization algorithm; locating key regions in the image data using a target detection algorithm for cropping, and retaining the original image and generating an edge cropping warning indicator when the calculated area ratio of the key region is less than a preset area threshold.
[0061] In its implementation, the system, based on the YOLOv5 architecture, implements a standardized four-step process: "correction → noise reduction → enhancement → cropping." First, Hough transform is used to detect straight lines at image edges, identifying tilt or perspective distortion within ±15°. A perspective transformation matrix is then used to correct the text area of the qualification certificate or contract to a horizontal and square state, ensuring that key information is free from geometric distortion. Next, a 3×3 Gaussian filter is combined to remove Gaussian noise, and a 5×5 median filter is used to remove salt-and-pepper noise, simultaneously eliminating interference factors such as reflections from scanned documents and shadows from the photograph. Finally, the CLAHE algorithm (Contrast Limiting Adaptive Histogram Equalization) is used to improve the global contrast between text and background, specifically optimizing brightness in unevenly lit areas to enhance text edges and seal details. Finally, using YOLOv5's object detection function, the system accurately locates key areas (ROIs) such as the borders of the qualification certificate, the issuing authority's seal, and the validity period markings, and automatically crops and removes invalid and redundant backgrounds such as the desktop. The cropping area threshold is set to "target area ratio ≥ 85%". If it is less than this ratio, the image is considered to be incomplete. In this case, the system will retain the original image and mark it with an "edge cropping warning" to ensure that no core information is lost.
[0062] The depth image preprocessing pipeline in this embodiment improves the clarity and standardization of image features, reduces the probability of subsequent visual models or OCR components being affected by reflections, deformations, and redundant background interference, and ensures effective focusing of information in the core area.
[0063] According to the method of this application embodiment, a natural language processing component that introduces domain rules is used to extract entities and perform two-layer filtering on text data to obtain standard text data. The method includes: importing a preset domain-specific dictionary into the natural language processing component for word segmentation, configuring context matching rules to identify numerical entities, and converting the numerical entities into a unified format for output; and using a first stop word list containing general meaningless function words and a second stop word list containing high-frequency policy words in the domain to filter redundant words in the segmented text data.
[0064] Specifically, the system uses the spaCy component for text optimization. First, the system constructs a domain-specific dictionary containing more than 200 core terms such as "R&D investment", "technology service contract", "high-tech enterprise", "industry-university-research cooperation agreement", etc., and imports it through the Vocab interface of spaCy, effectively avoiding the incorrect segmentation of professional terms. At the same time, by configuring regular expressions and context matching rules through EntityRuler, numerical entities such as "subsidy amount (e.g., 500,000 yuan)", "qualification validity period (e.g., 2023 - 2025)", "equity ratio (e.g., 35%)" are accurately identified and standardized and output in a unified format such as "numerical value + unit", "YYYY-MM-DD", etc. In the filtering stage, the system applies a two-layer stopword list of "general + domain": the first layer filters out meaningless general function words such as "of", "is", "in", etc.; the second layer targets the government affairs scenario and filters out a total of 156 redundant words that are frequent in policy documents but have no review value, such as "this measure", "relevant regulations", "attachment", "copy".
[0065] In this embodiment, by deeply combining the domain-specific dictionary with the two-layer stopword filtering mechanism, not only the technical problem of inaccurate word segmentation of rare words in specific industries is solved, but also the computing resources for subsequent processing are perfectly focused on the core content with substantial review value, improving the accuracy and efficiency of text processing.
[0066] According to the method of the embodiment of the present application, rule elements are extracted based on relevant policy text extraction rules to perform compliance comparison on text data and structured data, and a rule violation score is obtained, including: using a pre-trained language model to parse relevant policy text, extracting keywords containing upper limit requirements and prohibitive clauses, and converting the keywords into structured review rules; comparing the standard text data and structured data of the object to be reviewed with the structured review rules, and outputting a rule violation score according to the degree of violation.
[0067] In specific implementation, the compliance comparison branch module 220 uses the BERT model to deeply parse the newly introduced policy text related to innovation vouchers, accurately extracts business keywords such as "subsidy upper limit", "qualification requirements", "prohibitive clauses", etc., and automatically translates them into structured review rules executable by a computer, such as "the annual subsidy upper limit for a single enterprise ≤ 3,000,000 yuan". Subsequently, the system uses these extracted rigid rules to strictly perform compliance verification and comparison on the data provided by the object to be reviewed, and outputs a "rule violation score" ranging from 0 to 100 points.
[0068] In this embodiment, the pre-trained model endows the system with the ability to dynamically understand and adapt to new policies, directly converting policy clauses written in human language into machine verification rules, reducing the cost of manually maintaining the rule library, and ensuring the real-time nature and rigor of the review criteria.
[0069] According to the method of this application embodiment, a related knowledge graph is constructed based on relational data, and a graph query algorithm is used to extract the related risk score, including: importing data transformed into a triple structure into a graph database, constructing a related knowledge graph containing enterprise, institution, policy and product nodes and their related edge features; executing a graph query algorithm in the related knowledge graph to identify whether the object to be reviewed has a related transaction network or a duplicate application status.
[0070] Specifically, the knowledge graph module 230, relying on graph database engines such as Neo4j, uses the preprocessed triplet data to construct a large-scale "enterprise-institution-policy-product" relationship graph. This graph stores equity relationships between enterprises and also covers complex information networks such as cooperation history and policy compatibility. Based on this, the system runs graph query algorithms or graph neural network mining technology to deeply penetrate and identify hidden related-party transaction risks or malicious duplicate application behaviors, and outputs a quantitative "related risk score" (0-100 points).
[0071] This embodiment solves the performance and logical bottlenecks of relational databases when handling complex network relationships, enabling the auditing system to penetrate surface data and detect hidden risks such as group fraud or transfer of benefits.
[0072] According to the method of this application embodiment, the step of calculating the contribution of each input data feature to the classification result using a post-explanation algorithm and generating an interpretability report including feature importance visualization and local attribution analysis includes: using a game theory-based feature attribution algorithm to calculate the importance ranking of global sample input features and extracting the positive or negative contribution of each input data feature in a single sample to be reviewed, generating a visualization chart; using a local perturbation explanation algorithm to generate perturbation samples around the feature space of the single sample to be reviewed and fitting a local linear model, converting the core basis features and secondary influence features that lead to the classification result into natural language explanatory text; and generating the interpretability report based on the classification result, the visualization chart, the natural language explanatory text, and the comprehensive risk value.
[0073] Specifically, the post-hoc explanation generation module 280 combines SHAP and LIME as explanation tools. First, SHAP (Shapley value analysis based on game theory) is used to focus on the parallel multi-path input features in the preceding process. At the global explanation level, the Summary Plot displays the feature importance ranking of the entire sample (e.g., explicitly pointing out that "credit score" and "qualification validity period" are the top two key features); at the local explanation level, the Force Plot visually visualizes the positive or negative push contribution of each feature in the current single sample (e.g., showing that "credit score of 89" contributes a positive weight of +0.25, while "application amount exceeding industry average" generates a negative weight of -0.18). Second, the LIME algorithm is used to generate approximately 500 small perturbation samples in the local feature space (e.g., attempting to slightly modify the application amount or remaining qualification validity period), and a simple, transparent linear model is used to approximate the decision boundary of the complex pre-trained model around the sample. Based on the LIME results, the system can directly generate human-readable natural language explanation text, such as: "This application is preliminarily determined to be approved, with the core basis being the enterprise credit score of 89 points (above the threshold of 80 points) and the remaining validity period of the qualification certificate of 24 months (meeting the requirement of ≥12 months); the secondary influencing factor is the absence of violations in previous applications (contribution +0.23)." Finally, the system will output a structured "explainability report," which includes: ① natural language explanation text; ② SHAP visualization charts (Summary Plot + Force Plot); ③ detailed breakdown of the comprehensive risk value (e.g., the calculation basis for risk warning indicators such as "Rule violation score of 20 points, application amount exceeding the subsidy limit of 500,000 yuan").
[0074] This embodiment addresses the "black box" problem of AI-powered intelligent review models. Through an interpretation mechanism that integrates local evidence with global insights, highly complex feature calculations are transformed into clear, intuitive visual charts and business terminology. This not only meets the stringent decision-making traceability and auditing requirements of review processes but also significantly improves the efficiency and trustworthiness of reviews conducted by human review specialists when they take over work orders.
[0075] Figure 3 This is a hardware structure block diagram of an electronic device implementing the embodiments of this application.
[0076] like Figure 3 As shown, the electronic device 300 may include one or more processors 302, a system motherboard 308 connected to at least one of the processors 302, system memory 304 connected to the system motherboard 308, non-volatile memory (NVM) 306 connected to the system motherboard 308, and a network interface 310 connected to the system motherboard 308.
[0077] Processor 302 may include one or more single-core or multi-core processors. Processor 302 may include any combination of general-purpose processors and special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In embodiments of the invention, processor 302 may be configured to perform one or more embodiments according to various embodiments of this application.
[0078] In some embodiments, the system motherboard 308 may include any suitable interface controller to provide any suitable interface to at least one of the processors 302 and / or any suitable device or component communicating with the system motherboard 308.
[0079] In some embodiments, system motherboard 308 may include one or more memory controllers to provide an interface to system memory 304. System memory 304 may be used to load and store data and / or instructions. In some embodiments, system memory 304 of electronic device 300 may include any suitable volatile memory, such as suitable dynamic random access memory (DRAM).
[0080] The NVM 306 may include one or more tangible, non-transitory computer-readable media for storing data and / or instructions. In some embodiments, the NVM 306 may include any suitable non-volatile memory such as flash memory and / or any suitable non-volatile storage device, such as at least one of an HDD (Hard Disk Drive), a CD (Compact Disc) drive, or a DVD (Digital Versatile Disc) drive.
[0081] NVM 306 may include a portion of the storage resources on a device installed on electronic device 300, or it may be accessible by the device, but is not necessarily part of the device. For example, NVM 306 may be accessed over a network via network interface 310.
[0082] Specifically, system memory 304 and NVM 306 may each include a temporary copy and a permanent copy of instruction 320. Instruction 320 may include instructions that, when executed by at least one of processors 302, cause electronic device 300 to perform methods as described in any embodiment of this application. In some embodiments, instruction 320, hardware, firmware, and / or its software components may additionally / alternatively be located in system motherboard 308, network interface 310, and / or processor 302.
[0083] Network interface 310 may include a transceiver for providing a radio interface to electronic device 300, thereby enabling communication with any other suitable device (e.g., front-end module, antenna, etc.) via one or more networks. In some embodiments, network interface 310 may be integrated into other components of electronic device 300. For example, network interface 310 may be integrated into at least one of processor 302, system memory 304, NVM 306, and firmware device (not shown) with instructions, wherein when at least one of processor 302 executes the instructions, electronic device 300 implements one or more embodiments of various embodiments of this application.
[0084] The network interface 310 may further include any suitable hardware and / or firmware to provide a multiple-input multiple-output radio interface. For example, the network interface 310 may be a network adapter, a wireless network adapter, a telephone modem, and / or a wireless modem.
[0085] In one embodiment, at least one of the processors 302 may be packaged together with one or more controllers for the system motherboard 308 to form a system-in-package (SiP). In another embodiment, at least one of the processors 302 may be integrated on the same die with one or more controllers for the system motherboard 308 to form a system-on-a-chip (SoC).
[0086] The electronic device 300 may further include an input / output (I / O) device 312 connected to the system motherboard 308. The I / O device 312 may include a user interface enabling a user to interact with the electronic device 300; the peripheral component interface is designed to allow peripheral components to also interact with the electronic device 300. In some embodiments, the electronic device 300 may also include sensors for determining at least one of environmental conditions and location information related to the electronic device 300.
[0087] In some embodiments, I / O device 312 may include, but is not limited to, a display (e.g., a liquid crystal display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., a still image camera and / or a video camera), a flashlight (e.g., a light-emitting diode flash) and a keyboard.
[0088] In some embodiments, the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.
[0089] In some embodiments, the sensor may include, but is not limited to, a gyroscope sensor, an accelerometer, a proximity sensor, an ambient light sensor, and a positioning unit. The positioning unit may also be part of or interact with the network interface 310 to communicate with components of the positioning network, such as Global Positioning System (GPS) satellites.
[0090] It is understood that the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 300. In other embodiments of this application, the electronic device 300 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
[0091] Program code can be applied to input instructions to perform the functions described in this invention and generate output information. The output information can be applied to one or more output devices in a known manner. For the purposes of this application, a system for processing instructions including processor 302 includes any system having a processor such as a digital signal processor (DSP), microcontroller, application-specific integrated circuit (ASIC), or microprocessor.
[0092] The program code can be implemented using a high-level procedural language or an object-oriented programming language to communicate with the processing system. Assembly language or machine language can also be used when needed. In fact, the mechanisms described in this invention are not limited to any particular programming language. In either case, the language can be a compiled language or an interpreted language.
[0093] One or more aspects of at least one embodiment can be implemented by instructions stored on a computer-readable storage medium, which, when read and executed by a processor, enable an electronic device to implement the methods of the embodiments described in this invention.
[0094] According to some embodiments of this application, a computer storage medium is disclosed, on which instructions are stored, which, when executed on a computer, cause the computer to perform a data processing and interpretability auditing method according to embodiments of this application.
[0095] The method embodiments of this application correspond to this embodiment, and this embodiment can be implemented in conjunction with the method embodiments of this application. The relevant technical details mentioned in the method embodiments of this application are still valid in this embodiment, and will not be repeated here to reduce repetition. Accordingly, the relevant technical details mentioned in this embodiment can also be applied to the method embodiments of this application.
[0096] According to some embodiments of this application, a computer program product is disclosed, including computer executable instructions that are executed by a processor to implement a data processing and interpretability auditing method according to embodiments of this application.
[0097] The method embodiments of this application correspond to this embodiment, and this embodiment can be implemented in conjunction with the method embodiments of this application. The relevant technical details mentioned in the method embodiments of this application are still valid in this embodiment, and will not be repeated here to reduce repetition. Accordingly, the relevant technical details mentioned in this embodiment can also be applied to the method embodiments of this application.
[0098] It is understood that the specific embodiments described herein are merely for illustrative purposes and not for limiting the scope of this application. Furthermore, for ease of description, the accompanying drawings show only the parts relevant to this application, and not all of the structures or processes. It should be noted that similar reference numerals and letters in the drawings denote similar items throughout this application.
[0099] It should be understood that although the terms "first," "second," etc., may be used herein to describe various features, these features should not be limited by these terms. The use of these terms is merely for distinction and should not be construed as indicating or implying relative importance. For example, without departing from the scope of the exemplary embodiments, a first feature may be referred to as a second feature, and similarly, a second feature may be referred to as a first feature.
[0100] In the description of this application, it should also be noted that, unless otherwise explicitly specified and limited, the terms "set up," "connected," and "linked" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this embodiment based on the specific circumstances.
[0101] The illustrative embodiments of this application include, but are not limited to, data processing and interpretability auditing methods, apparatus, media, and computer program products.
[0102] Various aspects of the illustrative embodiments will be described using terminology commonly employed by those skilled in the art to convey the essence of their work to others skilled in the art. However, it will be apparent to those skilled in the art that some alternative embodiments will be practiced using the features partially described. Specific figures and configurations are set forth for purposes of explanation in order to provide a more thorough understanding of the illustrative embodiments. However, it will be apparent to those skilled in the art that alternative embodiments may be practiced without specific details. In some other instances, well-known features have been omitted or simplified herein to avoid obscuring the illustrative embodiments of this application.
[0103] Furthermore, the various operations will be described as multiple separate operations in a manner most conducive to understanding the illustrative embodiments; however, the order of description should not be construed as implying that these operations must depend on the order of description, and many of these operations may be performed in parallel, concurrently, or simultaneously. Moreover, the order of the operations may also be rearranged. The process may be terminated when the described operations are completed, but may also include additional steps not included in the figures. The process may correspond to a method, function, procedure, subroutine, subroutine, etc.
[0104] References to "an embodiment," "embodiment," "illustrative embodiment," etc., in this application indicate that the described embodiment may include specific features, structures, or properties; however, each embodiment may or may not necessarily include specific features, structures, or properties. Furthermore, these phrases are not necessarily directed at the same embodiment. Moreover, when specific features are described in conjunction with specific embodiments, the knowledge of those skilled in the art can influence the combination of these features with other embodiments, whether or not those embodiments are explicitly described.
[0105] Unless the context otherwise specifies, the terms “comprising,” “having,” and “including” are synonyms. The phrase “A and / or B” means “(A), (B), or (A and B).”
[0106] As used herein, the term "module" may refer to, as part of, or include: a memory (shared, dedicated, or grouped), an application-specific integrated circuit (ASIC), electronic circuitry and / or a processor (shared, dedicated, or grouped), combinational logic circuitry, and / or other suitable components that provide the said functionality for running one or more software or firmware programs.
[0107] In the accompanying drawings, some structural or methodological features may be shown in a specific arrangement and / or order. However, it should be understood that such a specific arrangement and / or order is not necessary. Rather, in some embodiments, these features may be illustrated in a manner and / or order different from that shown in the illustrative drawings. Furthermore, the inclusion of structural or methodological features in a particular drawing does not mean that all embodiments need to include such features; in some embodiments, these features may be omitted or may be combined with other features.
[0108] In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions or programs carried or stored on one or more transient or non-transient machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors, etc. When the instructions or program are run by a machine, the machine may perform the various methods described above. For example, the instructions may be distributed via a network or other computer-readable media. Therefore, machine-readable media may include, but are not limited to, any mechanism for storing or transmitting information in a machine-readable (e.g., computer-readable) form, such as floppy disks, optical disks, optical disc read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random access memories (RAMs), erasable programmable read-only memories (EPROMs), electronically erasable programmable read-only memories (EEPROMs), magnetic cards or optical cards, or flash memory or tangible machine-readable storage for transmitting network information via electrical, optical, acoustic, or other forms of signals (e.g., carrier waves, infrared signals, digital signals, etc.). Therefore, machine-readable media includes any form of machine-readable medium suitable for storing or transmitting electronic instructions or machine-readable (e.g., computer-readable) information.
[0109] The embodiments of this application have been described in detail above with reference to the accompanying drawings. However, the use of the technical solutions of this application is not limited to the various applications mentioned in the embodiments of this application. Various structures and modifications can be easily implemented with reference to the technical solutions of this application to achieve the various beneficial effects mentioned herein. Within the scope of knowledge possessed by those skilled in the art, all changes made without departing from the spirit of this application should be considered within the scope of this patent application.
Claims
1. A data processing and interpretability auditing method, characterized in that, The method includes: Obtain multi-source business data to be reviewed, including image data, text data, structured data, and relational data; Rule elements are extracted from relevant policy texts to perform compliance comparisons between the text data and the structured data, and rule violation scores are obtained; an association knowledge graph is constructed based on the relationship data, and an association risk score is extracted using a graph query algorithm; the comprehensive risk value of the object to be reviewed is calculated by combining the rule violation score and the association risk score. According to the preset risk weighting formula, the pre-calculated abnormal behavior score, the rule violation score, and the associated risk score are weighted and summed to obtain a comprehensive risk value; The multi-source business data, the abnormal behavior score, the rule violation score, and the associated risk score are input into a pre-trained audit module to obtain the decision confidence level. Based on the combined matching rules of the decision confidence level and the comprehensive risk value, the classification results for multiple preset audit conclusion categories are determined; The contribution of each input data feature to the classification result is calculated using a post-hoc interpretation algorithm, generating an interpretable report that includes feature importance visualization and local attribution analysis.
2. The method according to claim 1, characterized in that, Also includes: The image data is sequentially subjected to geometric correction, hybrid filtering noise reduction, and local quality enhancement, and key regions are cropped using a target detection algorithm to obtain standard image data; By using a natural language processing component that incorporates domain rules, entity extraction and two-layer filtering are performed on the text data to obtain standard text data; The structured data is normalized, and the relational data is transformed into a triplet structure.
3. The method according to claim 2, characterized in that, The image data is sequentially subjected to geometric correction, hybrid filtering for noise reduction, and local quality enhancement, and key regions are cropped using a target detection algorithm to obtain standard image data, including: By using edge line detection and perspective transformation matrix, the target text region in the image data is corrected to a horizontal state; By combining Gaussian filtering and median filtering algorithms, background noise interference in the image data is eliminated. The local brightness of the image data is optimized using a contrast-limited adaptive histogram equalization algorithm. The target detection algorithm is used to locate key regions in the image data for cropping. When the area ratio of the key region is less than a preset area threshold, the original image is preserved and an edge cropping warning sign is generated.
4. The method according to claim 2, characterized in that, The natural language processing component that utilizes domain-specific rules performs entity extraction and two-layer filtering on the text data to obtain standard text data, including: The preset domain-specific dictionary is imported into the natural language processing component for word segmentation, and context matching rules are configured to identify numerical entities. The numerical entities are then converted into a unified output format. The text data after word segmentation is filtered for redundant words by using a first stop word list containing general meaningless function words and a second stop word list containing high-frequency policy terms in the field.
5. The method according to claim 2, characterized in that, The step of extracting rule elements based on relevant policy texts to perform compliance comparisons between the text data and the structured data, and obtaining rule violation scores, includes: The relevant policy texts are parsed using a pre-trained language model to extract keywords containing upper limit requirements and prohibitive clauses, and these keywords are then transformed into structured review rules. The standard text data and structured data of the object to be reviewed are compared with the structured review rules, and the rule violation score is output according to the degree of violation.
6. The method according to claim 2, characterized in that, The step of constructing a relational knowledge graph based on the relational data and extracting relational risk scores using a graph query algorithm includes: The data transformed into the triple structure is imported into a graph database to construct the associated knowledge graph containing enterprise, institution, policy and product nodes and their associated edge features; The graph query algorithm is executed in the associated knowledge graph to identify whether the object to be reviewed has an associated transaction network or a duplicate application status.
7. The method according to claim 1, characterized in that, The step of using a post-hoc interpretation algorithm to calculate the contribution of each input data feature to the classification result and generating an interpretable report including feature importance visualization and local attribution analysis includes: Using a game theory-based feature attribution algorithm, the importance ranking of global sample input features is calculated, and the positive or negative contribution of each input data feature in a single sample to be reviewed is extracted to generate a visualization chart. Using a local perturbation explanation algorithm, perturbation samples are generated around the feature space of the individual sample to be reviewed, and a local linear model is fitted. The core basis features and secondary influence features that lead to the classification results are converted into natural language explanation text. The interpretability report is generated based on the classification results, the visualization charts, the natural language explanation text, and the comprehensive risk value.
8. An electronic device, characterized in that, The electronic device includes a memory storing computer-executable instructions and a processor, which, when executed by the processor, cause the electronic device to perform the data processing and interpretability auditing method according to any one of claims 1-7.
9. A computer storage medium, characterized in that, The computer storage medium stores instructions that, when executed on the computer, cause the computer to perform the data processing and interpretability auditing method according to any one of claims 1-7.
10. A computer program product, characterized in that, It includes computer-executable instructions, which are executed by a processor to implement the data processing and interpretability auditing method according to any one of claims 1-7.