An environmental risk dynamic assessment and intelligent auditing method and system
By generating audit context through multi-table intelligent parsing and multi-knowledge base retrieval modules, and combining it with innovative algorithm models for comprehensive evaluation, the problem of low accuracy and poor flexibility in table recognition in traditional audit methods has been solved, achieving efficient and accurate dynamic assessment and intelligent audit of environmental risks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HARBIN INST OF TECH
- Filing Date
- 2026-04-08
- Publication Date
- 2026-06-12
AI Technical Summary
Traditional document review methods rely on manual processes, which cannot effectively handle complex and multi-page tables, lack logical relationship analysis, and existing automated review systems lack flexibility and scalability, making it difficult to adapt to the differentiated needs of different industries and document types, and resulting in low recall rates for regulatory and standard retrieval.
The multi-table intelligent parsing module extracts the table structure and text content, uses the multi-knowledge base retrieval module for semantic retrieval, generates the audit context, generates audit feature vectors based on the preset audit workflow configuration, and combines the pollution intensity consistency index calculation model, cross-table data traceability verification model and compliance risk dynamic assessment model to conduct a comprehensive confidence assessment, and triggers the processing flow based on the assessment results.
It has achieved a shift from surface table recognition to deep data verification, improving the accuracy and efficiency of the review process. It automatically detects logical contradictions in the submitted data, identifies inconsistencies between multiple tables, provides early warnings of compliance risks, and avoids review chaos caused by rule conflicts.
Smart Images

Figure CN122197855A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method and system for dynamic environmental risk assessment and intelligent review. Specifically, it relates to an intelligent review method and system that combines the powerful understanding capabilities of large models, a knowledge retrieval mechanism for pollution discharge permits in the environmental protection field, and flexible review workflow configuration. Background Technology
[0002] With the deepening of digital transformation, enterprises and government departments need to process a large number of professional documents, especially application materials such as environmental impact assessment reports and pollution discharge permits. These documents typically contain a large amount of structured tabular data. Traditional document review methods mainly rely on manual work. Although existing OCR technology can recognize text content, its accuracy in recognizing complex table structures is limited, especially for tables spanning multiple pages, nested tables, and tables in scanned documents, where the accuracy of table recognition is low. Traditional review systems often process each table in isolation, failing to establish logical connections between tables and lacking contextual analysis. Existing automated review systems typically use hard-coded rules, making it difficult to adapt to the differentiated review needs of different industries and document types, lacking flexibility and scalability. In addition, when processing professional documents, a large number of regulations and standards need to be retrieved as review criteria. Traditional keyword-based retrieval methods have low recall rates and cannot effectively support intelligent review. Summary of the Invention
[0003] The purpose of this invention is to solve the problems in the prior art by proposing a method and system for dynamic environmental risk assessment and intelligent auditing.
[0004] This invention is achieved through the following technical solution: This invention proposes a method for dynamic environmental risk assessment and intelligent auditing, the method comprising: Step 1: Receive the files and reference files uploaded by users for review; Step 2: Extract the table structure and text content from the document to be reviewed using the multi-table intelligent parsing module to generate structured table data; Step 3: Use the multi-knowledge base retrieval module to perform semantic retrieval on the structured table data to generate an audit context; Step 4: Process the audit context based on the preset audit workflow configuration to generate an audit feature vector; Step 5: Input the audit feature vector into the preset audit analysis model to generate preliminary audit conclusions. The audit analysis model includes a pollution intensity consistency index calculation model, a cross-table data tracing verification model, and a compliance risk dynamic assessment model. The preliminary audit conclusions include logical consistency check results, cross-table consistency check results, and risk assessment results. Step 6: Perform a comprehensive confidence assessment based on the logical consistency check results, cross-table consistency check results, and risk assessment results, and trigger the corresponding processing flow based on the assessment results.
[0005] Furthermore, the pollution intensity consistency index calculation model is used to calculate pollutants. Consistency Index of Discharge Intensity The calculation formula is:
[0006] in, This represents the actual emissions. This is the theoretical emission rate calculated based on the emission coefficient. Actual emission intensity Theoretical emission intensity;
[0007]
[0008]
[0009] In the formula, Annual production volume of the product For the first This product is effective against pollutants The discharge coefficient, This refers to the amount of raw materials consumed. The raw material conversion coefficient, For the total number of products, This represents the total number of types of raw and auxiliary materials.
[0010] Furthermore, the cross-table data tracing verification model employs an entity association network algorithm to construct an entity association network of emission outlet identifiers, facility identifiers, product identifiers, and pollutant identifiers. This verifies the consistency of data across multiple tables, and calculates the tracing confidence level for each associated entity.
[0011] Entity-related network , For a collection of related field nodes, Let be the set of edges. For entities The number of times it should appear in the application form, For the matching function:
[0012] ; Edge weight:
[0013] in, The edit distance between the two strings. The semantic similarity between two nodes. The maximum length of the two node strings. The weighting coefficients and .
[0014] Furthermore, the aforementioned compliance risk dynamic assessment model calculates a comprehensive risk index based on multiple risk factors. The calculation formula is:
[0015] The risk factors include: Risk of exceeding emission standards , pollutants emission concentration, The standard limit; This indicates that the standard has been exceeded; Risk of exceeding the total limit , This refers to the annual cumulative emissions. The annual emission limit specified in the permit; Monitoring compliance risks , The actual monitoring frequency of the enterprise. The monitoring frequency is as required by the company's industry self-monitoring guidelines; Facility operation risks , For the operation time of the treatment facilities, This refers to the operating time of the production facilities; Historical violation risks Risk weights are assigned based on the company's historical violation records in the knowledge base. Operating condition fluctuation risk ,in This represents the average level of production load values at all points in time within a specific statistical period. The sample standard deviation of production load measures the production load at each time point relative to the mean. The average degree of deviation between them.
[0016] Furthermore, the method also includes: Priority thresholds are set for the inspection results of each dimension; Check whether there are rule conflicts in the inspection results of each dimension; When a rule conflict is detected, the priority threshold corresponding to the conflicting rule is obtained; The higher priority rule is executed according to the relationship between the priority thresholds, and the rule processing result is generated.
[0017] Furthermore, the comprehensive confidence assessment specifically includes: Set corresponding weight values for the inspection results of each dimension; The weight values are dynamically adjusted according to the rule type, which includes pollution intensity inspection rules, cross-table consistency rules, and risk assessment rules. The inspection results of each dimension are weighted and calculated using the adjusted weight values to obtain the comprehensive confidence assessment result.
[0018] Furthermore, the processing flow specifically includes: Obtain the output results of the dynamic compliance risk assessment model and the determination results of the comprehensive confidence assessment; Based on the output and the judgment results, you can choose to activate the automatic review channel or the manual review channel.
[0019] This invention also proposes a dynamic environmental risk assessment and intelligent auditing system, the system comprising: The file upload and management module is used to receive files and reference files uploaded by users for review. The multi-table intelligent parsing module is used to extract the table structure and text content from the documents to be reviewed and generate structured table data. A multi-knowledge base retrieval module is used to perform semantic retrieval on the structured table data and generate an audit context; A customized review workflow engine is used to process the review context based on a preset review workflow configuration to generate review feature vectors; The audit analysis model module is used to input the audit feature vector into a preset audit analysis model to generate preliminary audit conclusions. The audit analysis model includes a pollution discharge intensity consistency index calculation model, a cross-table data traceability verification model, and a compliance risk dynamic assessment model. The assessment and processing module is used to conduct a comprehensive confidence assessment based on the inspection results of various dimensions, and to trigger the corresponding processing flow based on the assessment results.
[0020] The present invention also proposes an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the environmental risk dynamic assessment and intelligent auditing method.
[0021] The present invention also proposes a computer-readable storage medium for storing computer instructions, which, when executed by a processor, implement the steps of the environmental risk dynamic assessment and intelligent auditing method.
[0022] The beneficial effects of this invention are: 1. This invention first extracts tables and identifies content from the documents to be reviewed using a multi-table intelligent parsing module, then performs semantic retrieval using a multi-knowledge base retrieval module to construct a review context. Based on a preset review workflow configuration, the review context is processed to generate a review feature vector containing business semantics. This feature vector is then input into a review analysis model composed of innovative algorithm models to generate preliminary review conclusions from multiple dimensions. Finally, a comprehensive confidence assessment of the conclusions is performed, triggering corresponding processing flows based on the assessment results. Through the collaborative work of multiple innovative algorithms, a transformation from surface-level table recognition to deep data verification is achieved, establishing a comprehensive evaluation mechanism based on multi-dimensional discharge permit data features, thus improving the accuracy and efficiency of the review process. 2. The emission intensity consistency index algorithm proposed in this invention can automatically detect logical contradictions in the declared data by comparing the actual emission amount with the theoretical emission amount calculated based on material balance; the cross-table data traceability verification algorithm is based on entity association network and can automatically identify inconsistencies in data from multiple tables; the compliance risk dynamic assessment model is based on six-dimensional risk factors and can provide early warning of compliance risks. 3. This invention sets priority thresholds for inspection results across multiple dimensions. These thresholds can be configured based on business importance and risk level. When executing rule checks, the system automatically detects whether there are conflicts between the judgment results of different rules. When a rule conflict is found, the priority thresholds corresponding to the relevant rules are obtained, and the execution order is determined by comparing the threshold sizes. This achieves the orderliness and consistency of rule judgments, avoiding audit chaos caused by rule conflicts. Attached Figure Description
[0023] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0024] Figure 1 This is a flowchart of an environmental risk dynamic assessment and intelligent auditing method according to the present invention. Detailed Implementation
[0025] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0026] Specifically, in combination Figure 1 This invention proposes a method for dynamic environmental risk assessment and intelligent auditing, the method comprising: Step 1: Receive the files and reference files uploaded by users for review; Step 2: Extract the table structure and text content from the document to be reviewed using the multi-table intelligent parsing module to generate structured table data; Step 3: Use the multi-knowledge base retrieval module to perform semantic retrieval on the structured table data to generate an audit context; Step 4: Process the audit context based on the preset audit workflow configuration to generate an audit feature vector; Step 5: Input the audit feature vector into the preset audit analysis model to generate preliminary audit conclusions. The audit analysis model includes a pollution intensity consistency index calculation model, a cross-table data tracing verification model, and a compliance risk dynamic assessment model. The preliminary audit conclusions include logical consistency check results, cross-table consistency check results, and risk assessment results. Step 6: Perform a comprehensive confidence assessment based on the logical consistency check results, cross-table consistency check results, and risk assessment results, and trigger the corresponding processing flow based on the assessment results.
[0027] This invention first uses a multi-table intelligent parsing module to extract tables and identify content in the documents to be reviewed, and then uses a multi-knowledge base retrieval module to perform semantic retrieval to construct a review context. Based on a preset review workflow configuration, the review context is processed to generate review feature vectors containing business semantics. These feature vectors are then input into a review analysis model composed of innovative algorithm models to generate preliminary review conclusions from several dimensions, including logical consistency, cross-table consistency, and risk assessment. Finally, a comprehensive confidence assessment of these conclusions is performed, and corresponding processing procedures are triggered based on the assessment results. Through the collaborative work of multiple innovative algorithms, a transformation from surface-level table recognition to deep data verification is achieved, establishing a comprehensive evaluation mechanism based on multi-dimensional discharge permit data features, thus improving the accuracy and efficiency of the review process.
[0028] Furthermore, the pollution intensity consistency index calculation model is used to calculate pollutants. Consistency Index of Discharge Intensity The calculation formula is:
[0029] in, This represents the actual emissions. This is the theoretical emission rate calculated based on the emission coefficient. Actual emission intensity Theoretical emission intensity;
[0030]
[0031]
[0032] In the formula, Annual production volume of the product For the first This product is effective against pollutants The pollution discharge coefficient (kg / ton of product). This refers to the amount of raw materials consumed. The raw material conversion coefficient, For the total number of products, This represents the total number of types of raw and auxiliary materials.
[0033] The emission intensity consistency index algorithm proposed in this invention can automatically detect logical contradictions in the declared data and identify possible underreporting or concealment by comparing the actual emission amount with the theoretical emission amount calculated based on material balance.
[0034] Furthermore, the cross-table data tracing verification model employs an entity association network algorithm to construct an entity association network of emission outlet identifiers, facility identifiers, product identifiers, and pollutant identifiers. This verifies the consistency of data across multiple tables, and calculates the tracing confidence level for each associated entity.
[0035] Entity-related network , For a collection of related field nodes, Let be the set of edges. For entities The number of times it should appear in the application form, For the matching function:
[0036] ; Edge weight:
[0037] in, The edit distance between the two strings. The semantic similarity between two nodes. The maximum length of the two node strings. The weighting coefficients and Set weight coefficients , .
[0038] This invention, based on a knowledge base design using entity association networks, not only improves the accuracy of cross-table data consistency verification but also supports more complex multi-table association analysis, automatically identifying isolated data, conflicting data, and missing data.
[0039] Furthermore, the aforementioned compliance risk dynamic assessment model calculates a comprehensive risk index based on multiple risk factors. The calculation formula is:
[0040] The risk factors include: Risk of exceeding emission standards , pollutants emission concentration, The standard limit; This indicates that the standard has been exceeded; Risk of exceeding the total limit , This refers to the annual cumulative emissions. The annual emission limit specified in the permit; Monitoring compliance risks , The actual monitoring frequency of the enterprise. The monitoring frequency is the one required by the self-monitoring guidelines for the relevant industry of the enterprise. The lower the compliance rate of the monitoring frequency, the higher the risk. Facility operation risks , For the operation time of the treatment facilities, This refers to the operating time of the production facilities; Historical violation risks Risk weights are assigned based on the company's historical violation records in the knowledge base. Operating condition fluctuation risk The coefficient of variation of production load; the greater the fluctuation, the higher the risk. This represents the average level of production load values at all points in time within a specific statistical period. The sample standard deviation of production load measures the production load at each time point relative to the mean. The average degree of deviation between them.
[0041] This invention is based on a comprehensive assessment of compliance risk levels using six risk factors, which can provide early warnings of compliance risks and offer decision support to regulatory authorities.
[0042] Furthermore, the method also includes: Priority thresholds are set for the inspection results of each dimension; Check whether there are rule conflicts in the inspection results of each dimension; When a rule conflict is detected, the priority threshold corresponding to the conflicting rule is obtained; The higher priority rule is executed according to the relationship between the priority thresholds, and the rule processing result is generated.
[0043] This invention sets priority thresholds for the inspection results of each dimension, and these thresholds can be configured according to the importance of the business and the degree of risk. When performing rule checks, the system automatically detects whether there are conflicts in the judgment results of different rules. When a rule conflict is found, the priority thresholds corresponding to the relevant rules are obtained, and the execution order is determined by comparing the threshold sizes. This achieves the orderliness and consistency of rule judgment and avoids the audit chaos caused by rule conflicts.
[0044] Furthermore, the comprehensive confidence assessment specifically includes: Set corresponding weight values for the inspection results of each dimension; The weight values are dynamically adjusted according to the rule type, which includes pollution intensity inspection rules, cross-table consistency rules, and risk assessment rules. The inspection results of each dimension are weighted and calculated using the adjusted weight values to obtain the comprehensive confidence assessment result.
[0045] This invention first sets initial weight values for the inspection results of each dimension; the system dynamically adjusts the weights according to the current rule type being processed; for example, when performing a pollution intensity inspection, the system increases the weight of the logical consistency inspection result; when performing cross-table consistency verification, the system increases the weight ratio of the cross-table consistency inspection result; finally, the system uses the adjusted weight values to perform a weighted calculation on the results of each rule to obtain the final comprehensive confidence assessment result; this achieves flexible adjustment of the assessment criteria, enabling the system to optimize the assessment strategy according to different business scenarios.
[0046] Furthermore, the processing flow specifically includes: Obtain the output results of the dynamic compliance risk assessment model and the determination results of the comprehensive confidence assessment; Based on the output and the judgment results, you can choose to activate the automatic review channel or the manual review channel.
[0047] The system first obtains the risk rating results of the document from the compliance risk dynamic assessment model, and then obtains the judgment results from the comprehensive confidence assessment mechanism for comprehensive analysis, and selects the most suitable processing channel.
[0048] This invention also proposes a dynamic environmental risk assessment and intelligent auditing system, the system comprising: The file upload and management module is used to receive files and reference files uploaded by users for review. The multi-table intelligent parsing module is used to extract the table structure and text content from the documents to be reviewed and generate structured table data. A multi-knowledge base retrieval module is used to perform semantic retrieval on the structured table data and generate an audit context; A customized review workflow engine is used to process the review context based on a preset review workflow configuration to generate review feature vectors; The audit analysis model module is used to input the audit feature vector into a preset audit analysis model to generate preliminary audit conclusions. The audit analysis model includes a pollution discharge intensity consistency index calculation model, a cross-table data traceability verification model, and a compliance risk dynamic assessment model. The assessment and processing module is used to conduct a comprehensive confidence assessment based on the inspection results of various dimensions, and to trigger the corresponding processing flow based on the assessment results.
[0049] Example The technical solutions in the embodiments of the present invention will be fully described below.
[0050] Example 1: In the review of pollution discharge permits, it is necessary to verify the basic logical consistency of the declared data. Traditional reviews only check the correctness of the format and content of a single table, lacking logical verification across tables. This example proposes the Pollutant Intensity Consistency Index (PICI), which performs logical consistency checks based on the material balance relationship between production capacity, raw material consumption, and emissions. The PICI algorithm includes the following steps: Step S101: Extract product capacity table data. Extract the annual production data of each product from the "Main Products and Capacity Table" and record it as follows: ( =1,2,..., , (Number of product types). Simultaneously, extract the unit of measurement for each product and convert it to tons per year.
[0051] Step S102: Extract raw material consumption data. Extract the annual consumption of various raw materials from the "Main Raw Materials and Fuel Information Table" and record it as follows: ( =1,2,..., , (Number of raw material types).
[0052] Step S103: Extract pollutant emission data. Extract the annual emission amount of each pollutant from the "Gas Pollutant Emission Information Table" and the "Wastewater Pollutant Emission Information Table," and record it as follows: ( =1,2,..., , (Number of pollutant types).
[0053] Step S104: Retrieve industry emission coefficients. Based on product type and industry classification, retrieve the relevant emission coefficients from the standard knowledge base. The first product for the first Emission coefficient of various pollutants The emission factor represents the amount of a certain pollutant generated per unit of product produced, expressed in kg / ton of product.
[0054] Step S105: Calculate the theoretical emissions. Based on the material balance principle, the formula for calculating the theoretical emissions is:
[0055] in, For the first Annual output of this product (tons / year). For the first The first product for the first Emission coefficient of each pollutant (kg / ton of product). For the first Annual consumption of raw and auxiliary materials (tons / year). For the first The first raw material for the first The conversion coefficient of a pollutant.
[0056] Step S106: Calculate the actual emission intensity and the theoretical emission intensity.
[0057] The actual emission intensity is:
[0058] The theoretical emission intensity is:
[0059] Step S107: Calculate pollutants Consistency Index of Discharge Intensity :
[0060] The range of values is (-∞, 1], when The time indicates complete consistency. The smaller the value, the greater the deviation.
[0061] Step S108: Consistency determination. Setting thresholds of 0.5 and 0.7 yields the determination results:
[0062] Step S109: Generate a PICI analysis report. The report includes the PICI value for each pollutant, the judgment result, a comparison table of theoretical and actual emissions, and an analysis of the reasons for the anomaly (such as process improvements, underreporting, etc.). Through the PICI algorithm, the system can automatically detect logical inconsistencies in the declared data. For example, a chemical company declares an annual output of 1000 tons, and the theoretical SO2 emission should be 5 tons (coefficient 5 kg / ton), but the actual declared emission is only 2 tons, resulting in a PICI of 0.4. The system indicates a serious anomaly and recommends that the auditors focus on verifying the desulfurization facility operation records.
[0063] Example 2: The application form for a discharge permit contains multiple related tables. The same data item (such as discharge outlet number or facility number) should remain consistent across different tables. Traditional review relies on manual comparison of each table, which is inefficient and prone to omissions. This example proposes a Cross-Table Data Traceability Verification (CTDV) algorithm to establish data lineage relationships between tables and automatically verify the consistency of data across multiple tables. The CTDV algorithm includes the following steps: Step S201: Identify key related fields. Based on the discharge permit form specifications, define a set of key related fields: Emission outlet identification: emission outlet number, emission outlet name, emission outlet location, etc. Facility identification: Production facility number, treatment facility number, etc. Product identification: Product name, production equipment name, etc. Pollutant labeling: pollutant type, pollutant code, etc. Step S202: Construct a data lineage graph. Extract the related fields from each table as nodes and establish the lineage relationships between the nodes:
[0064]
[0065] in, For a set of nodes, Let this be a set of edges. Edge weights. The similarity between two nodes is calculated using a combination of edit distance and semantic similarity:
[0066] in, The edit distance between the two strings. The semantic similarity between two nodes. The maximum length of the two node strings. The weighting coefficients and Set weight coefficients , .
[0067] Step S203: Perform cross-table consistency verification. For the same identifier field, check its appearance in different tables: Consistency rules include: Identifier uniqueness rule: The same emission outlet number should be unique in the application form and should not represent different emission outlets in different forms.
[0068] Numerical matching rule: The attributes (such as emission height and emission method) of the same emission outlet should be consistent in different tables.
[0069] Reference integrity rule: Facility numbers referenced in the emissions table must exist in the facilities table.
[0070] Step S204: Calculate the source tracing confidence. For each associated entity, calculate its matching degree in each table:
[0071] in, For entities The number of times it should appear in the application form, This is the matching function.
[0072]
[0073] Step S205: Identify data anomalies. Identify anomalies based on a confidence threshold (set to 0.8): Orphaned data: Identifiers that exist in one table but do not appear in other tables.
[0074] Conflicting data: The attribute values of the same identifier conflict in different tables.
[0075] Missing data: Data items that should exist but were not entered.
[0076] Step S206: Generate a data tracing report. The report includes: a visual representation of the data lineage, a list of consistency statuses for each related entity, details of abnormal data items, and suggestions for correcting related tables (e.g., "The height of emission outlet DA002 is 15m in the exhaust emission table but 20m in the basic information table; it is recommended to unify them"). The CTDV algorithm can detect data inconsistencies that are difficult for humans to detect. For example, a company may fill in emission outlet DA001 as emitting SO2 in its exhaust gas emission form, but DA001 is marked as "wastewater discharge outlet" in the emission outlet basic information form. The system will immediately identify this logical error and alert the auditors.
[0077] Example 3: Discharge permit audits not only need to check the current compliance status but also need to assess future compliance risks. Traditional audits are static snapshot-style checks, lacking dynamic risk assessment capabilities. This example proposes a Compliance Risk Dynamic Assessment Model (CRDAM), which comprehensively assesses a company's compliance risk level based on multiple risk factors.
[0078] The CRDAM algorithm includes the following steps: Step S301: Establish a risk factor system. Define the set of risk factors for pollution discharge permit compliance: Risk of exceeding emission standards:
[0079] in, pollutants emission concentration, These are standard limits. This indicates that the limit has been exceeded.
[0080] Risk of exceeding the total limit:
[0081] in, This refers to the annual cumulative emissions. The annual emission limit specified in the permit.
[0082] Monitoring compliance risks:
[0083] in The actual monitoring frequency of the enterprise. The monitoring frequency is the frequency required by the self-monitoring guidelines for the relevant industry of the enterprise. The lower the compliance rate of the monitoring frequency, the higher the risk.
[0084] Facility operation risks:
[0085] in, For the operation time of the treatment facilities, This refers to the operating time of the production facilities.
[0086] Historical violation risks (historical penalty records): Risk weights are assigned based on the company's historical violation records in the knowledge base:
[0087] Operating condition fluctuation risk:
[0088] The greater the fluctuation in the coefficient of variation of production load, the higher the risk. This represents the average production load value at all points in time (such as daily, weekly, or monthly) within a specific statistical period (e.g., the past quarter or year). It represents the company's typical production level during the audit period. The sample standard deviation of production load measures the production load at each time point relative to the mean. The average degree of deviation between them.
[0089] Step S302: Dynamic Weight Allocation. The weights of each factor are dynamically calculated using the AHP (Analytic Hierarchy Process). The weights are then adjusted based on industry characteristics. Key regulated industries (such as thermal power and steel): The weight of (risk of exceeding the standard) has been increased to 0.3.
[0090] Industries subject to total quantity control (such as chemicals): The weight of (total risk) has been increased to 0.3.
[0091] Routine industry supervision: Weights of each factor are evenly distributed.
[0092] Step S303: Calculate the comprehensive risk index: .
[0093] Generally speaking, The larger the value, the higher the risk.
[0094] Step S304: Risk Classification and Early Warning. According to... Risk levels are determined by the value:
[0095] Step S305: Risk Trend Prediction. Based on historical data trends, a comprehensive risk prediction index can be obtained:
[0096] in Based on the past At a specific point in time (e.g., 3 months) The slope obtained by performing a univariate linear regression of the sequence over time.
[0097] Step S306: Generate a risk heatmap. Visualize the risk distribution using emission outlets and pollutants as dimensions: The horizontal axis represents each emission outlet, and the vertical axis represents various pollutants; the color depth represents... Values. See Table 1 for details.
[0098] Table 1
[0099] Step S307: Output a risk warning report. The report includes: Comprehensive risk index and level, detailed scores for each risk factor, and a list of high-risk items (such as "risk of SO2 emissions exceeding standards at emission outlet DA001"). (e.g., "the main source of risk"), risk trend prediction (e.g., "the risk of exceeding the limit is expected to increase in the next month"), and risk mitigation recommendations (e.g., "it is recommended to increase the amount of desulfurizing agent added to reduce the risk of exceeding the limit").
[0100] The CRDAM model provides risk warnings for regulatory authorities. For example, a company's current emission concentrations may be compliant, but the peak production season is approaching. Based on historical data, predictions can be made... The total risk will increase from 0.5 to 0.85. The system will issue an orange alert in advance, and enterprises are advised to apply for temporary production reduction or purchase pollution discharge rights in advance.
[0101] Example 4: The pollutant discharge permit application form contains dozens of related tables (such as the "Main Products and Production Capacity Table," "Raw Materials and Fuel Information Table," and "Waste Gas / Wastewater Discharge Information Table"). Directly processing the original features of all fields would result in high dimensionality (usually exceeding 2000 dimensions) and large computational overhead. Therefore, this example proposes an Adaptive Feature Pooling and Compression (AFPC) algorithm. This algorithm, tailored to the characteristics of pollutant discharge permit data (multi-type features, numerical sensitivity, and cross-table dependencies), reduces dimensionality while preserving key information to the maximum extent, thereby significantly reducing computational and storage overhead while ensuring the accuracy of the review. The AFPC algorithm includes the following steps: Step S401: Feature Importance Assessment. Assess the importance of each feature dimension. In discharge permits, different features contribute significantly to compliance audits: emission concentration and total emission limits directly determine whether emissions exceed standards; while equipment model and operating temperature only provide supplementary information. Therefore, it is necessary to quantify the importance of each feature. This is based on gradient importance (reflecting the feature's importance). (the degree of impact on the final loss)
[0102] In the review of pollution discharge permits, features with large gradients are often key compliance indicators (e.g., the gradient increases significantly when the emission concentration deviates from the standard limit). When training an exceedance detection model, if a company's "SO2 emission concentration" feature changes from 50 mg / m³... 3 It becomes 60 mg / m 3(With a standard limit of 50), the loss increases dramatically, therefore the absolute value of the gradient for this feature is very large, and it is judged as highly important. However, changes in the "device model" feature cause almost no fluctuation in loss, and its gradient is close to 0. For an attention query vector... With the Features Based on the importance of attention (reflecting the degree of attention the model pays to features):
[0103] in This is the scaling factor.
[0104] For example, in cross-table consistency verification tasks, the attention mechanism automatically focuses on relevant fields such as "emission outlet number". However, in other cross-table consistency verification tasks (such as verifying whether "emission outlet DA001" appears in multiple tables and has consistent attributes), the attention mechanism automatically assigns high weight to the "emission outlet number" field, ignoring irrelevant fields such as "person filling out the form name". Therefore, the attention importance score of the former is significantly higher than that of the latter. Information entropy importance (high-entropy features contain more information):
[0105] in Features The probability of taking a certain value is obtained by statistically analyzing the frequency of each value of that feature in the training samples.
[0106] For discharge permits, numerical features (such as annual emissions) typically have higher entropy because they vary greatly among different companies; while categorical features (such as "compliant / non-compliant") have lower entropy.
[0107] Step S402: Adaptive Group Pooling. Features in the discharge permit are grouped according to their type and semantics, with different pooling strategies applied to different groups. Semantic feature groups include field names, entity names (e.g., "Discharge Outlet DA001"), and relational descriptions (e.g., "Referenced from"). Attention pooling is used to retain key entity information. Numerical feature groups include emission concentrations, annual emissions, removal efficiency, etc. Hybrid pooling with learnable weights is used, adaptively balancing average pooling (reflecting overall levels) and max pooling (reflecting peak values). Temporal feature groups include monthly variations, seasonal fluctuations, and production load fluctuations in pollutant emissions. Multi-scale temporal convolutional pooling is used to capture patterns across different periods.
[0108] Step S403: Compress high-dimensional features to low-dimensional features through learnable transformations. The feature dimension of a pollution discharge permit is typically 2048, which is compressed to 256 dimensions (compression ratio 8:1).
[0109] Autoencoder compression:
[0110]
[0111]
[0112] in This is the compressed low-dimensional feature vector. The original high-dimensional feature vector, Here is the encoder weight matrix. This is the encoder bias vector; The reconstructed feature vector, This is the decoder weight matrix. This is the decoder bias vector. Controlling sparsity (in this embodiment, we take...) L1 regularization makes many dimensions in the compressed features zero, which is well-suited to the characteristic of a large number of empty or missing fields in pollution discharge permit data (for example, a company that does not emit a certain pollutant has a corresponding feature of zero). For example, if a company does not emit "chromic acid mist," then the corresponding dimension in the original feature is 0. When learning compression, the autoencoder will also push these zero-dimensional compressed outputs to 0, thereby saving computational and storage resources.
[0113] Variational compression (VAE): Variational autoencoders (VAEs) introduce latent variables. And assume that its posterior distribution is a Gaussian distribution with independent dimensions:
[0114]
[0115]
[0116] The mean vector of the latent variables represents the central location of the compressed feature. The standard deviation vector of the latent variables represents the uncertainty in this dimension. The KL divergence term encourages the latent space to approximate a standard normal distribution, thereby obtaining a smooth and continuous compressed representation. VAE is very suitable when generating "virtual but compliant" pollution discharge data for data augmentation; in addition, the uncertainty of the latent space ( This can be used to evaluate the credibility of compressed features.
[0117] Step S404: Task-Aware Compression. The compression strategy is dynamically adjusted based on the downstream audit tasks. To handle multiple tasks simultaneously, an architecture combining a shared encoder and task-specific headers is used. Different audit tasks (concentration audit, total quantity audit, facility operation audit) focus on different fields; the gating network dynamically adjusts the importance of feature channels based on task embedding.
[0118] Gating vector:
[0119] in Embed vectors for tasks. This is the original feature vector.
[0120] This leads to the following task-specific characteristics:
[0121] Multi-task decoupling: Discharge permit review often requires simultaneous assessment of multiple compliance indicators (such as whether concentration exceeds standards, whether total quantity exceeds limits, and whether monitoring is compliant). A shared encoder extracts common features and then distributes them to different task heads.
[0122] Shared features:
[0123]
[0124] in For each independent small network, task-specific patterns are learned from shared features. This results in multitasking losses:
[0125] in Controlling the regularization strength (in this embodiment, we take...) =0.001). L2 regularization prevents shared features from overfitting to a specific task. A discharge permit may simultaneously need to determine whether SO2 concentration exceeds the standard (task 1), whether annual NOx emissions exceed the limit (task 2), and whether desulfurization facilities are operating normally (task 3). The shared encoder extracts common information (such as basic enterprise information and a list of emission outlets) from the original features, and the three task heads learn their respective discrimination patterns. This design avoids training a model separately for each task, significantly reducing computational cost.
[0126] Example 5: Complete Review Process like Figure 1 As shown below, the actual operation process of the system is illustrated through a complete scenario of reviewing a wastewater discharge permit application form: Scenario Description: This embodiment takes the discharge permit and environmental impact assessment report submitted by a chemical company as an example to explain in detail how the intelligent auditing system proposed in this invention collaboratively runs multi-table parsing, knowledge base retrieval, workflow engine and four core algorithm models (PICI, CTDV, CRDAM, AFPC), and finally outputs the audit conclusion and triggers the corresponding processing flow.
[0127] Step S501: File Upload. Enterprises upload their pending discharge permit and reference documents (relevant industry-specific technical specifications, self-monitoring plan, environmental impact assessment report, and previous year's review comments) to the system. The system automatically identifies the file type and uses the multi-table intelligent parsing module to extract the table data.
[0128] Step S502: Multi-knowledge base semantic retrieval. The multi-knowledge base retrieval module performs semantic retrieval on multiple knowledge bases based on the parsed field names and values: Emission coefficient database: Retrieve the emission coefficient database for the SO2 emission coefficient corresponding to product A, obtaining 5 kg / ton of product; Retrieve the standard limit database for the SO2 emission concentration limit of the industry in which the enterprise is located, obtaining 50 mg / m³. 3 The self-monitoring technical guidelines were retrieved, indicating the required monitoring frequency for the company was "once per quarter" (i.e., four times per year). A search of the historical penalty records database for the company's penalties over the past three years revealed no penalties. The search results were stored in a temporary cache as part of the audit context.
[0129] Step S503: Customized Workflow Configuration: The system automatically loads the corresponding audit workflow configuration based on the company's industry (chemical) and application type (annual execution report). The workflow includes the following rule groups, as shown in Table 2.
[0130] Table 2
[0131] The workflow engine combines the audit context generated in step S502 with the rule group configuration to generate a structured audit feature vector for subsequent algorithm model calls.
[0132] Step S504: Review and analyze the parallel computation of the model.
[0133] This involves calculating the consistency index of pollution discharge intensity, source tracing confidence, and comprehensive risk index, and then applying adaptive feature compression. The compressed features are used for the comprehensive confidence assessment in step S505.
[0134] Step S505: Comprehensive confidence assessment and rule conflict handling.
[0135] The system performs a weighted summation of the inspection results from the above three dimensions: Consistency Index of Discharge Intensity ( A score of 0.2 is classified as a severe abnormality.
[0136] Source tracing confidence ( ): The score is 0.3.
[0137] Comprehensive Risk Index ( A score of 0.9 indicates a low-risk risk profile.
[0138] The initial industry weight for this company: =0.4, =0.3, =0.3.
[0139] Overall confidence level: .
[0140] The system detected a rule conflict: PICI determined it to be "severely abnormal" and recommended immediate reversal, but CRDAM determined it to be "low risk" and recommended routine supervision. Based on the priority threshold configuration (PICI has higher priority than CRDAM in the chemical industry), the higher-priority rule (PICI) prevails. The system then triggers a manual review channel and generates a conflict explanation.
[0141] Step S506: Trigger the processing flow Based on the overall confidence level (0.44) and the conflict resolution results, the system automatically performs the following operations: marks the audit conclusion as "awaiting manual review" and pushes it to the auditor's workbench, generating a detailed audit report. It initiates a "data correction" task for the PICI anomaly, requiring the company to explain the reasons for the SO2 emission deviation. It automatically corrects the CTDV conflict item by changing the type of emission outlet DA001 from "wastewater discharge outlet" to "exhaust gas discharge outlet".
[0142] The present invention also proposes an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the environmental risk dynamic assessment and intelligent auditing method.
[0143] The present invention also proposes a computer-readable storage medium for storing computer instructions, which, when executed by a processor, implement the steps of the environmental risk dynamic assessment and intelligent auditing method.
[0144] The memory in this application embodiment can be volatile memory or non-volatile memory, or it can include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory used in the methods described in this invention is intended to include, but is not limited to, these and any other suitable types of memory.
[0145] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., high-density digital video discs (DVDs)), or semiconductor media (e.g., solid-state disks (SSDs)).
[0146] In implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software. The steps of the method disclosed in the embodiments of this application can be directly implemented by a hardware processor, or by a combination of hardware and software modules in the processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, detailed descriptions are omitted here.
[0147] It should be noted that the processor in the embodiments of this application can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method embodiments can be completed by the integrated logic circuitry in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied as execution by a hardware decoding processor, or as a combination of hardware and software modules in the decoding processor. The software modules can be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads the information in the memory and, in conjunction with its hardware, completes the steps of the above methods.
[0148] The present invention provides a detailed description of a dynamic environmental risk assessment and intelligent auditing method and system. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, those skilled in the art will recognize that there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A method for dynamic environmental risk assessment and intelligent auditing, characterized in that, The method includes: Step 1: Receive the files and reference files uploaded by users for review; Step 2: Extract the table structure and text content from the document to be reviewed using the multi-table intelligent parsing module to generate structured table data; Step 3: Use the multi-knowledge base retrieval module to perform semantic retrieval on the structured table data to generate an audit context; Step 4: Process the audit context based on the preset audit workflow configuration to generate an audit feature vector; Step 5: Input the audit feature vector into the preset audit analysis model to generate preliminary audit conclusions. The audit analysis model includes a pollution intensity consistency index calculation model, a cross-table data tracing verification model, and a compliance risk dynamic assessment model. The preliminary audit conclusions include logical consistency check results, cross-table consistency check results, and risk assessment results. Step 6: Perform a comprehensive confidence assessment based on the logical consistency check results, cross-table consistency check results, and risk assessment results, and trigger the corresponding processing flow based on the assessment results.
2. The method according to claim 1, characterized in that, The pollution discharge intensity consistency index calculation model is used to calculate pollutants. Consistency Index of Discharge Intensity The calculation formula is: in, This represents the actual emissions. This is the theoretical emission rate calculated based on the emission coefficient. Actual emission intensity Theoretical emission intensity; In the formula, Annual production volume of the product For the first This product is effective against pollutants The discharge coefficient, This refers to the amount of raw materials consumed. The raw material conversion coefficient, For the total number of products, This represents the total number of types of raw and auxiliary materials.
3. The method according to claim 1, characterized in that, The cross-table data tracing verification model employs an entity association network algorithm to construct an entity association network for emission outlet identifiers, facility identifiers, product identifiers, and pollutant identifiers. This verifies the consistency of data across multiple tables, and calculates the tracing confidence score for each associated entity. Entity-related network , For a collection of related field nodes, Let be the set of edges. For entities The number of times it should appear in the application form, For the matching function: ; Edge weight: in, The edit distance between the two strings. The semantic similarity between two nodes. The maximum length of the two node strings. The weighting coefficients and .
4. The method according to claim 1, characterized in that, The compliance risk dynamic assessment model calculates a comprehensive risk index based on multiple risk factors. The calculation formula is: The risk factors include: Risk of exceeding emission standards , pollutants emission concentration, The standard limit; This indicates that the standard has been exceeded; Risk of exceeding the total limit , This refers to the annual cumulative emissions. The annual emission limit specified in the permit; Monitoring compliance risks , The actual monitoring frequency of the enterprise. The monitoring frequency is as required by the company's industry self-monitoring guidelines; Facility operation risks , For the operation time of the treatment facilities, This refers to the operating time of the production facilities; Historical violation risks Risk weights are assigned based on the company's historical violation records in the knowledge base. Operating condition fluctuation risk ,in This represents the average level of production load values at all points in time within a specific statistical period. The sample standard deviation of production load measures the production load at each time point relative to the mean. The average degree of deviation between them.
5. The method according to claim 1, characterized in that, The method further includes: Priority thresholds are set for the inspection results of each dimension; Check whether there are rule conflicts in the inspection results of each dimension; When a rule conflict is detected, the priority threshold corresponding to the conflicting rule is obtained; The higher priority rule is executed according to the relationship between the priority thresholds, and the rule processing result is generated.
6. The method according to claim 1, characterized in that, The comprehensive confidence assessment specifically includes: Set corresponding weight values for the inspection results of each dimension; The weight values are dynamically adjusted according to the rule type, which includes pollution intensity inspection rules, cross-table consistency rules, and risk assessment rules. The inspection results of each dimension are weighted and calculated using the adjusted weight values to obtain the comprehensive confidence assessment result.
7. The method according to claim 1, characterized in that, The processing flow specifically includes: Obtain the output results of the dynamic compliance risk assessment model and the determination results of the comprehensive confidence assessment; Based on the output and the judgment results, you can choose to activate the automatic review channel or the manual review channel.
8. An environmental risk dynamic assessment and intelligent auditing system, characterized in that, The system includes: The file upload and management module is used to receive files and reference files uploaded by users for review. The multi-table intelligent parsing module is used to extract the table structure and text content from the documents to be reviewed and generate structured table data. A multi-knowledge base retrieval module is used to perform semantic retrieval on the structured table data and generate an audit context; A customized review workflow engine is used to process the review context based on a preset review workflow configuration to generate review feature vectors; The audit analysis model module is used to input the audit feature vector into a preset audit analysis model to generate preliminary audit conclusions. The audit analysis model includes a pollution discharge intensity consistency index calculation model, a cross-table data traceability verification model, and a compliance risk dynamic assessment model. The assessment and processing module is used to conduct a comprehensive confidence assessment based on the inspection results of various dimensions, and to trigger the corresponding processing flow based on the assessment results.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1-7.
10. A computer-readable storage medium for storing computer instructions, characterized in that, When the computer instructions are executed by the processor, they implement the steps of the method according to any one of claims 1-7.