A Multimodal Document Parsing and Evaluation Method and System for Bidding and Tendering Audit Scenarios

By employing a multimodal document parsing and evaluation method, combined with visual enhancement and a bimodal risk assessment model, the problems of difficult document parsing and omission of hidden risks in bidding audits have been solved, achieving intelligent and precise audit parsing.

CN122309718APending Publication Date: 2026-06-30SHENYUAN TECHNOLOGY (NANJING) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENYUAN TECHNOLOGY (NANJING) CO LTD
Filing Date
2026-06-02
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Current bidding and tendering audits face challenges such as heterogeneous document formats, massive content leading to parsing difficulties, high OCR recognition error rates, and a lack of understanding of soft rules, resulting in low audit efficiency and the omission of hidden risks.

Method used

We adopt a vision-enhanced hybrid parsing strategy, combining a bimodal risk assessment model with hard and soft rules, and employ a multimodal document parsing and assessment method, including document format recognition, page-level multimodal detection, text sequence reconstruction, structured data extraction, and risk assessment.

Benefits of technology

It enables intelligent and precise analysis of bidding documents, improving audit efficiency and quality, reducing the omission of hidden risks, and enhancing the intelligence and precision of the review.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309718A_ABST
    Figure CN122309718A_ABST
Patent Text Reader

Abstract

This invention discloses a multimodal document parsing and evaluation method and system for bidding and tendering audit scenarios. First, it collects the tender documents and bid documents to be audited to obtain a set of documents to be audited. Then, it identifies the document format attributes in the set of documents to be audited to obtain initial heterogeneous corpus. Next, it performs page-level multimodal detection on the initial heterogeneous corpus and obtains detection tags. This invention achieves the function of accurately constructing audit corpus from massive heterogeneous documents using a vision-enhanced hybrid parsing strategy, and then performing multimodal document parsing and evaluation using image processing algorithms. Furthermore, the dual-modal risk assessment model combining hard and soft rules not only solves the problems of low efficiency, incomplete coverage, and easy omission of hidden risks in existing manual review, but also solves the parsing difficulties caused by poor scanned document quality. This not only makes audit compliance review more intelligent and accurate, but also significantly improves the quality and efficiency of the review.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of multimodal document parsing and evaluation technology, specifically to a multimodal document parsing and evaluation method and system for bidding and auditing scenarios. Background Technology

[0002] Bidding and tendering audits are a systematic review and evaluation of the compliance, authenticity, and effectiveness of the entire bidding and tendering process. They are applied across numerous fields, including engineering construction and corporate procurement. The fundamental purpose of bidding and tendering audits is to ensure fairness, impartiality, and transparency in bidding and tendering activities; they are a crucial link in ensuring project compliance and the security of funds.

[0003] Currently, in existing bidding and tendering audit processes, auditors generally need to manually conduct detailed comparisons and reviews of a large number of bidding documents and tender documents. However, the content of unstructured documents such as audit regulations and bidding documents is vast and varied in format, leading to significant difficulties in document parsing during bidding and tendering audits. Furthermore, some tender documents are scanned copies after being stamped and have issues such as tilting, blurring, and excessive noise. This results in a high error rate when using Optical Character Recognition (OCR) for direct recognition, and relying entirely on OCR would be slow and wasteful of resources. On the other hand, relying entirely on text layer extraction cannot handle scanned copies and stamped pages. In addition, existing bidding and tendering audit processes are mostly based on hard rules of regular expressions, lacking the ability to understand soft rules based on expert experience, such as whether the technical solution is reasonable and whether the price constitutes an unbalanced price. Therefore, it is necessary to design a multimodal document parsing and evaluation method and system for bidding and tendering audit scenarios. Summary of the Invention

[0004] The purpose of this invention is to overcome the shortcomings of existing technologies and to better and more effectively solve the problems of difficult parsing, rigid rules, and lack of semantic understanding caused by the heterogeneous formats and large amounts of content of bidding documents, which seriously affect the quality and efficiency of auditing work. This invention provides a multimodal document parsing and evaluation method and system for bidding auditing scenarios. It achieves the function of accurately constructing audit corpora from massive heterogeneous documents using a hybrid parsing strategy based on visual enhancement, and then performing multimodal document parsing and evaluation using image processing algorithms. Furthermore, the dual-modal risk assessment model combining hard and soft rules not only solves the problems of low efficiency, incomplete coverage, and easy omission of hidden risks in existing manual review, but also solves the parsing difficulties caused by poor scanned document quality. This not only makes audit compliance review more intelligent and accurate, but also significantly improves the quality and efficiency of the review.

[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows: A multimodal document parsing and evaluation method for bidding and tendering audit scenarios includes the following steps: Step A: Collect the tender documents and bid documents to be audited and obtain the set of documents to be audited; then identify the document format attributes in the set of documents to be audited and obtain the initial heterogeneous corpus. Step B involves performing page-level multimodal detection on the initial heterogeneous corpus and obtaining detection tags. Then, the detection tags are subjected to split parsing and text sequence reconstruction based on visual feature enhancement to establish a full-scale structured audit corpus. Step C: Use a large language model to extract audit elements from the full structured audit corpus and obtain structured data objects; Step D: Construct a set of hard rules and a set of soft rules based on the user's natural language input assessment intent, and then establish a bimodal risk assessment model based on the set of hard rules and the set of soft rules; Step E involves using a bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on structured data objects to obtain a comprehensive risk score, and then generating a risk assessment report based on the comprehensive risk score.

[0006] The aforementioned multimodal document parsing and evaluation method for bidding and tendering audit scenarios includes step A, which involves collecting the tender documents and bid documents to be audited to obtain a set of documents to be audited, and then identifying the document format attributes in the set of documents to be audited to obtain initial heterogeneous corpus. The specific steps are as follows. Step A1: Collect the tender documents and bid documents to be audited and obtain the set of documents to be audited, as shown in formula (1). (1) in, For the collection of documents to be audited, For the tender documents, For the first The bid documents of each bidding unit; Step A2 involves identifying the document format attributes in the audit document set and obtaining the initial heterogeneous corpus. Specifically, this involves reading the binary file header feature code of each document in the audit document set and identifying the actual format of the corresponding file. Then, the documents that failed format verification and the documents with corrupted formats are filtered out to obtain the initial heterogeneous corpus.

[0007] The aforementioned multimodal document parsing and evaluation method for bidding and tendering audit scenarios, in step B, involves performing page-level multimodal detection on the initial heterogeneous corpus and obtaining detection tags. Then, visual feature-enhanced split parsing and text sequence reconstruction are performed on the detection tags to establish a full-scale structured audit corpus. The specific steps are as follows: Step B1: Perform page-level multimodal detection on the initial heterogeneous corpus and obtain detection labels. Specifically, perform raster sampling on each page of the document in the initial heterogeneous corpus, then use the connected component analysis algorithm CCA to calculate the distribution density of page elements, and combine it with a lightweight convolutional neural network CNN to extract the high-dimensional texture features of the page to obtain detection labels, as shown in formula (2). (2) in, For the first Page detection markers, This is a page complexity detection function based on visual perception. This is the page texture feature vector extracted using convolution operations. The distribution density characteristics of the black connected regions on the page. Given any page from a document in the initial heterogeneous corpus, This is a preset texture complexity threshold; Step B2 involves performing visual feature-enhanced splitting and text sequence reconstruction on the detected markers, and establishing a full-scale structured audit corpus. The specific steps are as follows: Step B21 involves performing visual feature-enhanced splitting and text sequence reconstruction on the detected markers to obtain the single-page content, as shown in formula (3). (3) in, This is a single-page content. This is an optical character recognition model based on the Transformer architecture. This is an image preprocessing function, and the image preprocessing function... It includes tilt correction based on Radon transform and adaptive binarization denoising based on Otsu's algorithm. This is a direct text stream extraction algorithm for PDFs. To detect and mark as true, To detect and mark as false; Step B22, add all single-page content By page number sequence By splicing and reorganizing paragraphs, a full structured audit corpus can be obtained. .

[0008] The aforementioned multimodal document parsing and evaluation method for bidding and tendering audit scenarios, step C, involves using a large language model to extract audit elements from the full structured audit corpus and obtain structured data objects. The specific steps are as follows. Step C1: Construct a prompt template for extracting audit elements. Specific prompt word template It is to set the target set for extraction. ,in Indicates the first The prompt for the target; Step C2, based on the prompt word template A large language model is used to extract audit elements from the full structured audit corpus and obtain structured data objects, as shown in formula (4). (4) in, For the output structured data object, For the reasoning process of large language models, For large language models; Step C3: Perform a confidence check on the structured data object. If the confidence level of the structured data object is lower than the preset confidence threshold... If so, a manual review flag will be triggered.

[0009] The aforementioned multimodal document parsing and evaluation method for bidding and tendering audit scenarios, step D, involves constructing a hard rule set and a soft rule set based on the user's natural language input evaluation intent, and then establishing a bimodal risk assessment model based on the hard rule set and the soft rule set. The specific steps are as follows: Step D1 involves constructing a hard rule set and a soft rule set based on the user's natural language input evaluation intent. The specific steps are as follows: Step D11: Construct a set of hard rules, specifically from the tender documents. Extract explicit numerical constraints and qualification thresholds and establish a set of hard rules. ; Step D12: Construct a set of soft rules, specifically receiving business instructions (Inst) input by audit experts in natural language, and then using semantic mapping operators to process the business instructions. This is transformed into a set of soft rules, as shown in formula (5). (5) in, It is a set of soft rules. A mapping operator for converting natural language into logical rules. For the auditing industry knowledge base; Step D2 involves establishing a bimodal risk assessment model based on both the hard rule set and the soft rule set. Specifically, this involves the hard rule set... and soft rule set The two-modal risk assessment model is obtained by combining the two modalities, as shown in formula (6). (6) in, This is the complete set of rules for the bimodal risk assessment model.

[0010] The aforementioned multimodal document parsing and assessment method for bidding and tendering audit scenarios, step E, involves using a bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on structured data objects to obtain a comprehensive risk score, and then generating a risk assessment report based on the comprehensive risk score. The specific steps are as follows. Step E1 involves using a bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on structured data objects to obtain a comprehensive risk score. Specifically, this involves analyzing the structured data objects... The system retrieves matching tender document rules and clauses and bid response facts, then inputs the retrieved matching tender document rules and clauses and bid response facts into the bimodal risk assessment model to perform hard rule comparison and soft semantic assessment and obtain a comprehensive risk score, as shown in formula (7). (7) in, As a comprehensive risk score, and These are the weighting coefficients for hard rules and soft rules, respectively. The total number of hard rules participating in the evaluation. For the index of hard rules and regulations, For deterministic logic comparison functions, For the facts of the bid document response, For the first A hard rule This is a hard rule penalty indicator function, if the tender document responds to the facts. Violation of Article A hard rule When this occurs, the hard rule penalty indicator function takes a value of 1, if the tender document responds to the facts. Does not violate Article A hard rule When this happens, the hard rule penalty indicator function takes the value of 0; The total number of soft rules participating in the evaluation. For the index of soft rule clauses, For the first A soft rule, This refers to a semantic reasoning scoring function based on a large language model, and the semantic reasoning scoring function... Used to calculate the facts of the bid response With the A soft rule semantic deviation; Step E2, based on the comprehensive risk score Generate a risk assessment report, which includes the overall risk level, details of the risk points, the original text of the relevant clauses, and rectification suggestions.

[0011] A multimodal document parsing and evaluation system for bidding and tendering audit scenarios includes a document recognition module, a multimodal detection module, an audit element extraction module, an evaluation model construction module, and a risk assessment module. The document recognition module collects the tender documents and bid documents to be audited to obtain a set of documents to be audited, then identifies the document format attributes in the set to obtain initial heterogeneous corpus. The multimodal detection module performs page-level multimodal detection on the initial heterogeneous corpus to obtain detection markers, then performs visual feature-enhanced splitting parsing and text sequence reconstruction on the detection markers to establish a full dataset. The system includes a structured audit corpus; the audit element extraction module uses a large language model to extract audit elements from the full structured audit corpus and obtain structured data objects; the assessment model construction module constructs a set of hard rules and a set of soft rules based on the user's natural language input assessment intent, and then establishes a bimodal risk assessment model based on the hard and soft rule sets; the risk assessment module uses the bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on the structured data objects and obtain a comprehensive risk score, and then generates a risk assessment report based on the comprehensive risk score.

[0012] The beneficial effects of this invention are as follows: This invention provides a multimodal document parsing and evaluation method and system for bidding and tendering audit scenarios. First, it collects the tender documents and bid documents to be audited to obtain a set of documents to be audited. Then, it identifies the document format attributes in the set of documents to be audited and obtains initial heterogeneous corpus. Next, it performs page-level multimodal detection on the initial heterogeneous corpus and obtains detection tags. Then, it performs split parsing and text sequence reconstruction based on visual feature enhancement on the detection tags to establish a full-scale structured audit corpus. Subsequently, it uses a large language model to extract audit elements from the full-scale structured audit corpus and obtain structured data objects. Then, based on the user's natural language input evaluation intent, it constructs a set of hard rules and a set of soft rules. Finally, it establishes a bimodal risk assessment system based on the hard rule set and the soft rule set. The model estimates the risk of structured data objects and then uses a bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring to obtain a comprehensive risk score. A risk assessment report is then generated based on the comprehensive risk score. This effectively realizes the multimodal document parsing and assessment method and system for bidding and auditing scenarios. It employs a vision-enhanced hybrid parsing strategy to accurately construct audit corpora from massive heterogeneous documents and performs multimodal document parsing and assessment using image processing algorithms. Furthermore, the bimodal risk assessment model, which combines hard and soft rules, not only solves the problems of low efficiency, incomplete coverage, and easy omission of hidden risks in existing manual reviews, but also addresses the parsing difficulties caused by poor scanned document quality. This not only makes audit compliance reviews more intelligent and accurate but also significantly improves review quality and efficiency. Attached Figure Description

[0013] Figure 1 This is an overall flowchart of a multimodal document parsing and evaluation method for bidding and auditing scenarios according to the present invention; Figure 2 This is a schematic diagram of the multimodal intelligent parsing and traffic splitting strategy of the present invention; Figure 3 This is a logical architecture diagram of the dual-modal risk assessment of the present invention. Detailed Implementation

[0014] The present invention will now be further described with reference to the accompanying drawings.

[0015] like Figure 1 As shown, the present invention provides a multimodal document parsing and evaluation method for bidding and auditing scenarios, comprising the following steps: Step A involves collecting the tender documents and bid documents to be audited to obtain a set of documents to be audited. Then, the document format attributes in the set of documents to be audited are identified to obtain initial heterogeneous corpus. The specific steps are as follows: Step A1: Collect the tender documents and bid documents to be audited and obtain the set of documents to be audited, as shown in formula (1). (1) in, For the collection of documents to be audited, For the tender documents, For the first The bid documents of each bidding unit; Step A2 involves identifying the document format attributes in the audit document set and obtaining the initial heterogeneous corpus. Specifically, this involves reading the binary file header feature code of each document in the audit document set and identifying the actual format of the corresponding file. Then, the documents that failed format verification and the documents with corrupted formats are filtered out to obtain the initial heterogeneous corpus.

[0016] like Figure 2 As shown, step B involves performing page-level multimodal detection on the initial heterogeneous corpus and obtaining detection tags. Then, visual feature-enhanced splitting and text sequence reconstruction are performed on the detection tags to establish a full-scale structured audit corpus. The specific steps are as follows: Step B1: Perform page-level multimodal detection on the initial heterogeneous corpus and obtain detection labels. Specifically, perform raster sampling on each page of the document in the initial heterogeneous corpus, then use the connected component analysis algorithm CCA to calculate the distribution density of page elements, and combine it with a lightweight convolutional neural network CNN to extract the high-dimensional texture features of the page to obtain detection labels, as shown in formula (2). (2) in, For the first Page detection markers, This is a page complexity detection function based on visual perception. This is the page texture feature vector extracted using convolution operations. The distribution density characteristics of the black connected regions on the page. Given any page from a document in the initial heterogeneous corpus, This is a preset texture complexity threshold; Step B2 involves performing visual feature-enhanced splitting and text sequence reconstruction on the detected markers, and establishing a full-scale structured audit corpus. The specific steps are as follows: Step B21 involves performing visual feature-enhanced splitting and text sequence reconstruction on the detected markers to obtain the single-page content, as shown in formula (3). (3) in, This is a single-page content. This is an optical character recognition model based on the Transformer architecture. This is an image preprocessing function, and the image preprocessing function... It includes tilt correction based on Radon transform and adaptive binarization denoising based on Otsu's algorithm. This is a direct text stream extraction algorithm for PDFs. To detect and mark as true, To detect and mark as false; Step B22, add all single-page content By page number sequence By splicing and reorganizing paragraphs, a full structured audit corpus can be obtained. .

[0017] Step C involves using a large language model to extract audit elements from the full structured audit corpus and obtain structured data objects. The specific steps are as follows. Step C1: Construct a prompt template for extracting audit elements. Specific prompt word template It is to set the target set for extraction. ,in Indicates the first The prompt for the target; Step C2, based on the prompt word template A large language model is used to extract audit elements from the full structured audit corpus and obtain structured data objects, as shown in formula (4). (4) in, For the output structured data object, For the reasoning process of large language models, For large language models; Step C3: Perform a confidence check on the structured data object. If the confidence level of the structured data object is lower than the preset confidence threshold... If so, a manual review flag will be triggered.

[0018] Step D involves constructing a hard rule set and a soft rule set based on the user's natural language input assessment intent, and then establishing a bimodal risk assessment model based on the hard rule set and the soft rule set. The specific steps are as follows: Step D1 involves constructing a hard rule set and a soft rule set based on the user's natural language input evaluation intent. The specific steps are as follows: Step D11: Construct a set of hard rules, specifically from the tender documents. Extract explicit numerical constraints and qualification thresholds and establish a set of hard rules. ; Step D12: Construct a set of soft rules, specifically receiving business instructions (Inst) input by audit experts in natural language, and then using semantic mapping operators to process the business instructions. This is transformed into a set of soft rules, as shown in formula (5). (5) in, It is a set of soft rules. A mapping operator for converting natural language into logical rules. For the auditing industry knowledge base; Step D2 involves establishing a bimodal risk assessment model based on both the hard rule set and the soft rule set. Specifically, this involves the hard rule set... and soft rule set The two-modal risk assessment model is obtained by combining the two modalities, as shown in formula (6). (6) in, This is the complete set of rules for the bimodal risk assessment model.

[0019] like Figure 3 As shown, step E involves using a bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on structured data objects to obtain a comprehensive risk score. A risk assessment report is then generated based on the comprehensive risk score. The specific steps are as follows: Step E1 involves using a bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on structured data objects to obtain a comprehensive risk score. Specifically, this involves analyzing the structured data objects... The system retrieves matching tender document rules and clauses and bid response facts, then inputs the retrieved matching tender document rules and clauses and bid response facts into the bimodal risk assessment model to perform hard rule comparison and soft semantic assessment and obtain a comprehensive risk score, as shown in formula (7). (7) in, As a comprehensive risk score, and These are the weighting coefficients for hard rules and soft rules, respectively. The total number of hard rules participating in the evaluation. For the index of hard rules and regulations, For deterministic logic comparison functions, For the facts of the bid document response, For the first A hard rule This is a hard rule penalty indicator function, if the tender document responds to the facts. Violation of Article A hard rule When this occurs, the hard rule penalty indicator function takes a value of 1, if the tender document responds to the facts. Does not violate Article A hard rule When this happens, the hard rule penalty indicator function takes the value of 0; The total number of soft rules participating in the evaluation. For the index of soft rule clauses, For the first A soft rule, This refers to a semantic reasoning scoring function based on a large language model, and the semantic reasoning scoring function... Used to calculate the facts of the bid response With the A soft rule semantic deviation; Step E2, based on the comprehensive risk score Generate a risk assessment report, which includes the overall risk level, details of the risk points, the original text of the relevant clauses, and rectification suggestions.

[0020] A multimodal document parsing and evaluation system for bidding and tendering audit scenarios includes a document recognition module, a multimodal detection module, an audit element extraction module, an evaluation model construction module, and a risk assessment module. The document recognition module collects the tender documents and bid documents to be audited to obtain a set of documents to be audited, then identifies the document format attributes in the set to obtain initial heterogeneous corpus. The multimodal detection module performs page-level multimodal detection on the initial heterogeneous corpus to obtain detection markers, then performs visual feature-enhanced splitting parsing and text sequence reconstruction on the detection markers to establish a full dataset. The system includes a structured audit corpus; the audit element extraction module uses a large language model to extract audit elements from the full structured audit corpus and obtain structured data objects; the assessment model construction module constructs a set of hard rules and a set of soft rules based on the user's natural language input assessment intent, and then establishes a bimodal risk assessment model based on the hard and soft rule sets; the risk assessment module uses the bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on the structured data objects and obtain a comprehensive risk score, and then generates a risk assessment report based on the comprehensive risk score.

[0021] To better illustrate the effects of this invention, a specific embodiment of using the method of this invention in a real audit project is described below.

[0022] In this embodiment, a set of bidding documents was selected as the test object, which included one 150-page bidding document and three bid documents, each averaging 300 pages, with approximately 15% being scanned copies. To verify the robustness of the invention in image processing, a simulation test set containing Gaussian noise and random rotation was constructed, where the mean Gaussian noise was 0 and the variance was 0.01, and the random rotation ranged from -5° to +5°.

[0023] This experiment compares the performance of traditional OCR processing strategies and visual enhancement-based split parsing strategies under harsh simulated environments; the document parsing performance and image processing robustness verification statistics of this embodiment are shown in Table 1.

[0024] Table 1. Comparison of document parsing performance and image processing robustness;

[0025] As shown in Table 1, thanks to the introduction of tilt correction based on Radon transform and Otsu adaptive binarization algorithm, the character recognition accuracy (CER) of this invention is significantly improved by 13.7 percentage points when processing noisy and tilted simulated scans. At the same time, through intelligent traffic splitting strategy, the overall processing speed is improved by more than 20 times and memory usage is reduced by about 74%.

[0026] This embodiment uses Qwen2.5-72B-Instruct as the basic large model and compares the identification performance of the traditional keyword matching baseline method and the bimodal assessment model in different risk types; the risk assessment accuracy evaluation of this embodiment is shown in Table 2.

[0027] Table 2. Risk Perception Assessment Accuracy Evaluation Form;

[0028] As shown in Table 2, when dealing with soft risk identification tasks, such as identifying implicit unreasonable clauses and unbalanced pricing tendencies in technical solutions, the method of this invention has significant advantages over the traditional baseline method, with an F1 score improvement of 131.6%. This effectively solves the pain point of traditional methods that only understand data but not semantics, and greatly reduces the risk of audit omissions.

[0029] In summary, the multimodal document parsing and evaluation method and system for bidding and tendering audit scenarios of the present invention first collects the tender documents and bid documents to be audited to obtain a set of documents to be audited. Then, it identifies the document format attributes in the set of documents to be audited to obtain initial heterogeneous corpus. Next, it performs page-level multimodal detection on the initial heterogeneous corpus to obtain detection tags. Then, it performs split parsing and text sequence reconstruction based on visual feature enhancement on the detection tags to establish a full-scale structured audit corpus. Subsequently, it uses a large language model to extract audit elements from the full-scale structured audit corpus to obtain structured data objects. Then, it constructs a set of hard rules and a set of soft rules based on the user's natural language input evaluation intent. Finally, it establishes a bimodal risk assessment model based on the hard rule set and the soft rule set. Finally, a bimodal risk assessment model is used to perform multi-dimensional risk matching and quantitative scoring on structured data objects to obtain a comprehensive risk score, and then a risk assessment report is generated based on the comprehensive risk score. This effectively realizes the multimodal document parsing and assessment method and system for bidding and auditing scenarios. It has the function of accurately constructing audit corpus from massive heterogeneous documents using a vision-enhanced hybrid parsing strategy and performing multimodal document parsing and assessment using image processing algorithms. Moreover, the bimodal risk assessment model, which combines hard rules and soft rules, not only solves the problems of low efficiency, incomplete coverage, and easy omission of hidden risks in existing manual review, but also solves the parsing difficulties caused by poor scanned document quality. This not only makes audit compliance review more intelligent and accurate, but also significantly improves the quality and efficiency of review.

[0030] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of the present invention is defined by the appended claims and their equivalents.

Claims

1. A multimodal document parsing and evaluation method for bidding and tendering audit scenarios, characterized in that: Includes the following steps, Step A: Collect the tender documents and bid documents to be audited and obtain the set of documents to be audited; then identify the document format attributes in the set of documents to be audited and obtain the initial heterogeneous corpus. Step B involves performing page-level multimodal detection on the initial heterogeneous corpus and obtaining detection tags. Then, the detection tags are subjected to split parsing and text sequence reconstruction based on visual feature enhancement to establish a full-scale structured audit corpus. Step C: Use a large language model to extract audit elements from the full structured audit corpus and obtain structured data objects; Step D: Construct a set of hard rules and a set of soft rules based on the user's natural language input assessment intent, and then establish a bimodal risk assessment model based on the set of hard rules and the set of soft rules; Step E involves using a bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on structured data objects to obtain a comprehensive risk score, and then generating a risk assessment report based on the comprehensive risk score.

2. The multimodal document parsing and evaluation method for bidding and tendering audit scenarios according to claim 1, characterized in that: Step A involves collecting the tender documents and bid documents to be audited to obtain a set of documents to be audited. Then, the document format attributes in the set of documents to be audited are identified to obtain initial heterogeneous corpus. The specific steps are as follows: Step A1: Collect the tender documents and bid documents to be audited and obtain the set of documents to be audited, as shown in formula (1). (1) in, For the collection of documents to be audited, For the tender documents, For the first The bid documents of each bidding unit; Step A2 involves identifying the document format attributes in the audit document set and obtaining the initial heterogeneous corpus. Specifically, this involves reading the binary file header feature code of each document in the audit document set and identifying the actual format of the corresponding file. Then, the documents that failed format verification and the documents with corrupted formats are filtered out to obtain the initial heterogeneous corpus.

3. The multimodal document parsing and evaluation method for bidding and tendering audit scenarios according to claim 2, characterized in that: Step B involves performing page-level multimodal detection on the initial heterogeneous corpus and obtaining detection tags. Then, visual feature-enhanced splitting and text sequence reconstruction are performed on the detection tags to establish a full-scale structured audit corpus. The specific steps are as follows: Step B1: Perform page-level multimodal detection on the initial heterogeneous corpus and obtain detection labels. Specifically, perform raster sampling on each page of the document in the initial heterogeneous corpus, then use the connected component analysis algorithm CCA to calculate the distribution density of page elements, and combine it with a lightweight convolutional neural network CNN to extract the high-dimensional texture features of the page to obtain detection labels, as shown in formula (2). (2) in, For the first Page detection markers, This is a page complexity detection function based on visual perception. This is the page texture feature vector extracted using convolution operations. The distribution density characteristics of the black connected regions on the page. Given any page from a document in the initial heterogeneous corpus, This is a preset texture complexity threshold; Step B2 involves performing visual feature-enhanced splitting and text sequence reconstruction on the detected markers, and establishing a full-scale structured audit corpus. The specific steps are as follows: Step B21 involves performing visual feature-enhanced splitting and text sequence reconstruction on the detected markers to obtain the single-page content, as shown in formula (3). (3) in, This is a single-page content. This is an optical character recognition model based on the Transformer architecture. This is an image preprocessing function, and the image preprocessing function... It includes tilt correction based on Radon transform and adaptive binarization denoising based on Otsu's algorithm. This is a direct text stream extraction algorithm for PDFs. To detect and mark as true, To detect and mark as false; Step B22, add all single-page content By page number sequence By splicing and reorganizing paragraphs, a full structured audit corpus can be obtained. .

4. The multimodal document parsing and evaluation method for bidding and tendering audit scenarios according to claim 3, characterized in that: Step C involves using a large language model to extract audit elements from the full structured audit corpus and obtain structured data objects. The specific steps are as follows. Step C1: Construct a prompt template for extracting audit elements. Specific prompt word template It is to set the target set for extraction. ,in Indicates the first The prompt for the target; Step C2, based on the prompt word template A large language model is used to extract audit elements from the full structured audit corpus and obtain structured data objects, as shown in formula (4). (4) in, For the output structured data object, For the reasoning process of large language models, For large language models; Step C3: Perform a confidence check on the structured data object. If the confidence level of the structured data object is lower than the preset confidence threshold... If so, a manual review flag will be triggered.

5. The multimodal document parsing and evaluation method for bidding and tendering audit scenarios according to claim 4, characterized in that: Step D involves constructing a hard rule set and a soft rule set based on the user's natural language input assessment intent, and then establishing a bimodal risk assessment model based on the hard rule set and the soft rule set. The specific steps are as follows: Step D1 involves constructing a hard rule set and a soft rule set based on the user's natural language input evaluation intent. The specific steps are as follows: Step D11: Construct a set of hard rules, specifically from the tender documents. Extract explicit numerical constraints and qualification thresholds and establish a set of hard rules. ; Step D12: Construct a set of soft rules, specifically receiving business instructions (Inst) input by audit experts in natural language, and then using semantic mapping operators to process the business instructions. This is transformed into a set of soft rules, as shown in formula (5). (5) in, It is a set of soft rules. A mapping operator for converting natural language into logical rules. For the auditing industry knowledge base; Step D2 involves establishing a bimodal risk assessment model based on both the hard rule set and the soft rule set. Specifically, this involves the hard rule set... and soft rule set The two-modal risk assessment model is obtained by combining the two modalities, as shown in formula (6). (6) in, This is the complete set of rules for the bimodal risk assessment model.

6. The multimodal document parsing and evaluation method for bidding and tendering audit scenarios according to claim 5, characterized in that: Step E involves using a bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on structured data objects to obtain a comprehensive risk score. A risk assessment report is then generated based on the comprehensive risk score. The specific steps are as follows: Step E1 involves using a bimodal risk assessment model to perform multi-dimensional risk matching and quantitative scoring on structured data objects to obtain a comprehensive risk score. Specifically, this involves analyzing the structured data objects... The system retrieves matching tender document rules and clauses and bid response facts, then inputs the retrieved matching tender document rules and clauses and bid response facts into the bimodal risk assessment model to perform hard rule comparison and soft semantic assessment and obtain a comprehensive risk score, as shown in formula (7). (7) in, As a comprehensive risk score, and These are the weighting coefficients for hard rules and soft rules, respectively. The total number of hard rules participating in the evaluation. For the index of hard rules and regulations, For deterministic logic comparison functions, For the facts of the bid document response, For the first A hard rule This is a hard rule penalty indicator function, if the tender document responds to the facts. Violation of Article A hard rule When this occurs, the hard rule penalty indicator function takes a value of 1, if the tender document responds to the facts. Does not violate Article A hard rule When this happens, the hard rule penalty indicator function takes the value of 0; The total number of soft rules participating in the evaluation. For the index of soft rule clauses, For the first A soft rule, This refers to a semantic reasoning scoring function based on a large language model, and the semantic reasoning scoring function... Used to calculate the facts of the bid response With the A soft rule semantic deviation; Step E2, based on the comprehensive risk score Generate a risk assessment report, which includes the overall risk level, details of the risk points, the original text of the relevant clauses, and rectification suggestions.

7. A multimodal document parsing and evaluation system for bidding and tendering audit scenarios, wherein the specific parsing and evaluation process of the multimodal document parsing and evaluation system is based on the multimodal document parsing and evaluation method according to any one of claims 1-6, characterized in that: It includes a document recognition module, a multimodal detection module, an audit element extraction module, an evaluation model construction module, and a risk assessment module. The document recognition module is used to collect the tender documents and bid documents to be audited and obtain a set of documents to be audited. Then, it identifies the document format attributes in the set of documents to be audited and obtains the initial heterogeneous corpus. The multimodal detection module is used to perform page-level multimodal detection on the initial heterogeneous corpus and obtain detection tags. Then, it performs split parsing and text sequence reconstruction based on visual feature enhancement on the detection tags and establishes a full-scale structured audit corpus. The audit element extraction module is used to extract audit elements from the full structured audit corpus using a large language model and obtain structured data objects. The assessment model construction module is used to construct a set of hard rules and a set of soft rules based on the user's assessment intent input in natural language, and then establish a bimodal risk assessment model based on the set of hard rules and the set of soft rules. The risk assessment module is used to perform multi-dimensional risk matching and quantitative scoring on structured data objects using a bimodal risk assessment model to obtain a comprehensive risk score, and then generate a risk assessment report based on the comprehensive risk score.