A Deep Learning-Based Method and System for Recognizing and Verifying the Directory Hierarchy of Bidding Documents

By using a deep learning model to automatically identify and verify the bid document catalog, the high cost and low accuracy problems caused by manual definition are solved. It realizes the automated identification and verification of the catalog level, improves processing efficiency and accuracy, and adapts to different project scenarios.

CN122309740APending Publication Date: 2026-06-30BEIJING ZHONGHONG AN TECH DEV CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING ZHONGHONG AN TECH DEV CO LTD
Filing Date
2026-04-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, the identification and verification of the bid document catalog relies on manual definition, resulting in high rule maintenance costs and low accuracy in judging the hierarchical levels of fuzzy boundaries, making it impossible to automatically detect mismatches, missing or redundant issues in the catalog hierarchy.

Method used

A deep learning-based approach is adopted to perform hierarchical recognition of the tender document template and user-imported catalog information through a deep learning model, construct a joint matching and alignment mechanism, perform multi-category anomaly classification and detection, and perform difference merging and correction to output the final catalog information.

Benefits of technology

It achieves automatic identification and verification of directory levels, reduces the cost of manual parsing, improves processing efficiency and accuracy, ensures the consistency and standardization of directories, and has continuous learning capabilities to adapt to different project scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309740A_ABST
    Figure CN122309740A_ABST
Patent Text Reader

Abstract

This invention relates to the field of intelligent document processing technology, specifically disclosing a method and system for identifying and verifying the directory hierarchy of tender documents based on deep learning. The invention obtains first directory information and second directory text information, performs hierarchical identification on the second directory text information using a deep learning model to obtain the second directory information; extracts hierarchical node features based on the first and second directory information respectively, constructs a joint matching alignment mechanism through node features, and outputs directory matching pairs; executes a multi-category anomaly classification and detection strategy based on the second-level directory matching pairs to determine the category of directory anomalies, and assesses whether manual intervention is needed by statistically analyzing the number of multi-category directory anomalies; and performs difference merging and correction on the second directory information based on the multi-category directory anomaly types to output the final directory information. This invention can improve the efficiency of tender document preparation and enhance the consistency and standardization of the directory structure.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent document processing technology, and more specifically, to a method and system for identifying and verifying the directory hierarchy of tender documents based on deep learning. Background Technology

[0002] In the bidding process, the preparation of bid documents relies heavily on an accurate understanding and response to the structure of the bidding documents' table of contents. Bidders must strictly adhere to the requirements of the bidding documents and respond to their substantive requirements and conditions. The table of contents serves as the skeleton of the bid document, and its accuracy and completeness directly impact the evaluation experts' first impression, even carrying significant weight in certain scoring criteria. However, the industry currently faces the following major pain points: The format of bidding document templates varies greatly. Different industries and different procuring entities use different numbering methods, indentation methods, font formats, etc., making it time-consuming and error-prone to manually interpret the hierarchical relationships line by line. According to industry research data, a typical bidding document table of contents contains an average of 50-200 hierarchical nodes, and manual parsing takes an average of 30-60 minutes with an error rate of approximately 5%-10%. AI-generated and manually imported versions of the table of contents coexist, lacking a systematic alignment and verification mechanism between the two versions. In practice, bidders may use AI tools to generate the table of contents and manually import historical project tables of contents, but existing tools cannot automatically compare the differences between the two versions. Mismatches, missing, and redundant hierarchical structures are difficult to detect automatically, requiring manual line-by-line comparison. When differences exist between the two versions, manual comparison is not only inefficient but also prone to missing subtle hierarchical differences.

[0003] Therefore, it is necessary to provide a method and system for identifying and verifying the directory hierarchy of tender documents based on deep learning to solve the above-mentioned technical problems. In order to solve the above problems, a technical solution is provided. Summary of the Invention

[0004] To overcome the aforementioned shortcomings of existing technologies, this invention provides a deep learning-based method and system for identifying and verifying the directory hierarchy of tender documents. This method addresses the problems of existing directory identification relying on manual definition, high rule maintenance costs, the need to redefine rules for each new numbering format, and low accuracy in determining hierarchy levels with ambiguous boundaries.

[0005] To achieve the above objectives, the present invention provides the following technical solution: A deep learning-based method for identifying and verifying the directory hierarchy of tender documents includes the following steps: The first directory information is obtained by parsing the tender document template. The first directory information includes first-level text information, first-level node information and first-level relationship information. The second directory text information is obtained by user import. Based on a deep learning model, the text information of the second directory is hierarchically identified, and the second-level node information and the second-level relationship information are obtained respectively to form the second directory information; Based on the information from the first directory and the information from the second directory, extract the hierarchical node features respectively, construct a joint matching and alignment mechanism through the node features, and output the directory matching pairs; Based on the second-level directory matching pair, a multi-category anomaly classification and detection strategy is executed to determine the category directory anomaly type, and the need for manual resolution is assessed by counting the number of multi-category directory anomalies. Based on the multiple categories of directory anomaly types, the differences in the second directory information are merged and corrected to output the final directory information.

[0006] As a further aspect of the present invention, the second directory text information is hierarchically identified based on a deep learning model to obtain the second-level node information and the second-level relationship information, thereby constructing the second directory information. The specific steps are as follows: The text information of the second directory is obtained and subjected to structured preprocessing to obtain text structure features; the text structure features include text semantic features and text position features; For the text information in the second directory, extract the text structure features of each line, input the text structure features into a deep learning model to perform hierarchical recognition of the text information in the second directory, and use BIOES to perform hierarchical annotation of the text information in the second directory. Extracting second-level node relationships and second-level relationship information based on hierarchical annotation.

[0007] As a further aspect of the present invention, hierarchical node features are extracted based on the first directory information and the second directory information, and a joint matching alignment mechanism is constructed through the node features to output directory matching pairs. The directory matching pairs include first-level directory matching pairs and second-level directory matching pairs. The joint matching alignment mechanism is specifically as follows: a first-level directory matching pair is obtained by performing a first-class matching alignment based on the hierarchical node features, and a second-level directory matching pair is output by performing path consistency verification based on the first-level directory matching pair.

[0008] As a further aspect of the present invention, hierarchical node features are extracted based on the first directory information and the second directory information, and a joint matching alignment mechanism is constructed through the node features to output directory matching pairs. The specific steps are as follows: First-level node information is extracted from the first directory information to construct a first-level node set; second-level node information is extracted from the second directory information to construct a second-level node set. Based on the first-level node information and the second-level node information, hierarchical node features are extracted respectively. The hierarchical node features include the first-level node features and the second-level node information features. Based on the first-level node features and the second-level node information features, a first-class matching alignment is performed, and the similarity of node sub-features is calculated respectively. The comprehensive matching similarity is calculated by comprehensively processing the node sub-feature similarity, and the node feature similarity matrix is ​​obtained. Based on the node feature similarity matrix, the Hungarian algorithm is used to solve the optimal bipartite graph matching and obtain the first-level directory matching pairs; Based on the first-level directory matching pairs, the path consistency of the first-level directory information and the second-level directory information is checked, and the second-level directory matching pairs are output.

[0009] As a further aspect of the present invention, the first-level node features include first semantic node sub-features, first structural path sub-features, and first-level structural sub-features; the second-level node features include second semantic node sub-features, second structural path sub-features, and second-level structural sub-features; and the node sub-feature similarity includes first sub-feature similarity, second sub-feature similarity, and third sub-feature similarity.

[0010] As a further aspect of the present invention, the path consistency verification specifically involves: if the parent node of any level node information in the first-level directory matching pair is not aligned, the alignment result of the first-level directory matching pair is invalid, the first-level directory matching pairs with invalid alignment results are removed, and the remaining first-level directory matching pairs are selected as second-level directory matching pairs.

[0011] As a further aspect of the present invention, a multi-category anomaly classification and detection strategy is executed based on the second-level directory matching pair to determine the category directory anomaly type. The multi-category anomaly classification and detection strategy includes hierarchical mismatch detection, node missing detection, node redundancy detection, sequence anomaly detection, and content difference detection.

[0012] As a further aspect of the present invention, the need for manual intervention is assessed by statistically analyzing the number of anomalies in multiple categories of directories. The specific steps are as follows: By counting the number of abnormal differences in multiple categories, if the number of abnormal differences in multiple categories is greater than or equal to the abnormal number threshold, manual intervention is required; if the number of abnormal differences in multiple categories is less than the abnormal number threshold, manual intervention is not required.

[0013] As a further aspect of the present invention, the second directory information is merged and corrected based on multiple categories of directory anomalies to output the final directory information. The specific steps are as follows: Based on the preset difference merging rules, the merged directory information is generated and displayed to the user in an editable format, along with a difference detection report. The preset difference merging rules are as follows: for mismatched levels, the level of the first directory information is used; for missing or redundant nodes, the nodes are merged using the first and second directory information. Receive user modification operations on directory information, update the final directory based on user modification operations, and record user feedback data; User feedback data is added to the training dataset for iterative optimization of the deep learning model.

[0014] A deep learning-based bid document directory hierarchy recognition and verification system includes a directory information acquisition module, a hierarchy recognition module, a joint matching and alignment module, an anomaly classification and detection module, and a directory information merging and correction module. The directory information acquisition module is used to obtain the first directory information based on the bidding document template. The first directory information includes the first-level text information, the first-level node information, and the first-level relationship information. The second directory text information is also obtained through user import. The hierarchical recognition module is used to perform hierarchical recognition on the text information of the second directory based on a deep learning model, and to obtain the second-level node information and the second-level relationship information to form the second directory information. The joint matching and alignment module is used to extract hierarchical node features based on the first directory information and the second directory information, construct a joint matching and alignment mechanism through the node features, and output directory matching pairs. The anomaly classification and detection module is used to execute multi-category anomaly classification and detection strategies based on second-level directory matching pairs, determine the anomaly types of the category directories, and assess whether manual intervention is required by statistically analyzing the number of anomalies in the multi-category directories. The directory information merging and correction module is used to merge and correct differences in the second directory information based on multiple categories of directory anomaly types, and output the final directory information.

[0015] The technical effects and advantages of this invention, a deep learning-based method and system for identifying and verifying the directory hierarchy of tender documents, are as follows: This invention automatically identifies the directory hierarchy using a deep learning model, rapidly converting unstructured text into a structured directory, significantly reducing manual parsing costs and improving processing efficiency. By integrating semantic, structural path, and hierarchical information into a joint matching mechanism, combined with path consistency constraints, it effectively avoids mismatches caused by relying solely on text similarity, significantly improving the accuracy and stability of directory alignment. Through a multi-category anomaly classification and detection mechanism, it comprehensively identifies structural and content problems in the directory and uses quantitative evaluation to determine whether manual intervention is needed, achieving a reasonable division of labor between automation and manual review. Through a difference merging and correction strategy, it ensures directory standardization while considering practical business flexibility, both supplementing missing content and retaining effective extended information. By introducing user feedback data for model iterative optimization, the system possesses continuous learning capabilities, constantly adapting to different project scenarios and user habits, thereby continuously improving performance and intelligence over long-term use. This invention not only improves the efficiency of tender document preparation but also enhances the consistency and standardization of the directory structure. Attached Figure Description

[0016] Figure 1 A flowchart of a deep learning-based bid document directory hierarchy identification and verification method provided in an embodiment of the present invention; Figure 2 This is a system block diagram of a deep learning-based bid document directory hierarchy recognition and verification system provided in an embodiment of the present invention. Detailed Implementation

[0017] The technical solutions of this invention will now be clearly and completely described with reference to the accompanying drawings. Obviously, the described technical solutions are only a part of this invention, and not all of it. All other technical solutions obtained by those skilled in the art based on the technical solutions of this invention without inventive effort are within the scope of protection of this invention.

[0018] like Figure 1 The diagram shown is a flowchart of a deep learning-based method for identifying and verifying the directory hierarchy of tender documents, provided in an embodiment of the present invention. Figure 1The execution entity of the method shown can be a software and / or hardware device. The execution entity of this application can include, but is not limited to, at least one of the following: user equipment, network equipment, etc. User equipment can include, but is not limited to, computers, smartphones, personal digital assistants (PDAs), and the aforementioned electronic devices. Network equipment can include, but is not limited to, a single network server, a server group consisting of multiple network servers, or a cloud based on cloud computing consisting of a large number of computers or network servers. Cloud computing is a type of distributed computing, consisting of a super virtual computer composed of a group of loosely coupled computers. This embodiment does not limit this. Steps S1 to S5 are detailed as follows: Step S1: Obtain the first directory information based on the bidding document template. The first directory information includes the first-level text information, the first-level node information, and the first-level relationship information. Obtain the second directory text information through user import. Step S2: Based on the deep learning model, perform hierarchical recognition on the text information of the second directory, and obtain the second-level node information and the second-level relationship information to form the second directory information; Step S3: Extract hierarchical node features based on the first directory information and the second directory information respectively, construct a joint matching alignment mechanism through node features, and output directory matching pairs; Step S4: Based on the second-level directory matching pair, execute the multi-category anomaly classification and detection strategy to determine the category directory anomaly type, and evaluate whether manual resolution is required by counting the number of multi-category directory anomalies. Step S5: Based on the multiple categories of directory anomaly types, perform difference merging and correction on the second directory information, and output the final directory information.

[0019] It should be noted that all parameter data involved in the calculation in this invention must be dimensionless beforehand.

[0020] Preferably, the second directory text information is hierarchically identified based on a deep learning model to obtain second-level node information and second-level relationship information, thereby constructing the second directory information. The specific steps are as follows: The text information of the second directory is obtained and subjected to structured preprocessing to obtain text structure features; the structured preprocessing includes text cleaning, text segmentation and text structure feature extraction; the text structure features include text semantic features and text position features; For the text information in the second directory, extract the text structure features of each line, input the text structure features into the deep learning model to perform hierarchical recognition of the text information in the second directory, and use BIOES (annotation system) to perform hierarchical annotation of the text information in the second directory. Extracting second-level node relationships and second-level relationship information based on hierarchical annotation.

[0021] It should be noted that deep learning models specifically include: Input layer: character embedding sublayer (128-dimensional character vector) + word embedding sublayer (256-dimensional word vector) + position feature sublayer (line number position encoding, indentation level encoding), fused to output a 512-dimensional input vector; Encoding layer: Bidirectional Long Short-Term Memory (LSTM) network structure, with the hidden layers of both the forward and backward LSTM layers having a dimension of 256, and the outputs concatenated to obtain a 512-dimensional context feature vector; Decoding layer: Linear chain conditional random field (CRF) structure, containing emission matrix and transition matrix, decoded using Viterbi algorithm.

[0022] It should be noted that the BIOES (Browsing and Annotation System) is used to hierarchically annotate the text information in the second directory. B indicates the beginning, I indicates the inner section, E indicates the end, S indicates the end, C indicates the chapter, S indicates the section, A indicates the item, P indicates the clause, and PA indicates the paragraph. Specifically, this includes: BC: Chapter beginning; IC: Chapter interior; EC: Chapter end; SC: Ending chapter; BS: Beginning of a section; IS: Inside a section; ES: End of a section; SS: Independent section; BA: Beginning of a line; IA: Inside a line; EA: End of a line; SA: Standalone line; BP: Beginning of a design; IP: Internal part of a design; EP: End of a design; SP: Independent design; BT: Beginning of a heading; IT: Inside a heading; ET: End of a heading; ST: Standalone heading; B-PA: Beginning of paragraph; I-PA: Inner paragraph; E-PA: End of paragraph; S-PA: Independent paragraph; O: Non-directory content.

[0023] The constraints for the above-mentioned hierarchical annotations include: The BX tag can only be followed by an IX tag, an EX tag, or an SX tag, where X represents any level type, including C, S, A, P, and PA; An IX tag can only be followed by another IX tag or an EX tag; An EX tag cannot be followed by an IX tag; The transfer between different level types is constrained by the level depth; a chapter level type cannot be directly followed by a section level type.

[0024] In an embodiment of the present invention, by introducing a hierarchical recognition mechanism based on deep learning, the original directory text is automatically converted into structured directory information, significantly improving the automation and accuracy of directory processing.

[0025] Specifically, the user copies the following directory fragment from the engineering tender document: "I. Project Overview; (I) Project Background; 1. Construction Necessity; 2. Construction Objectives; (II) Implementation Scope; II. Technical Solutions; (I) Overall Design; 1. System Architecture; 2. Technical Route Description...". During the copying process, the original format information of this directory may be lost, such as unclear indentation, mixed Chinese and Arabic numerals in the numbering, and even line breaks in some lines. First, obtain the second directory text information and perform structured preprocessing operations. In the text cleaning stage, extra blank lines, abnormal symbols, and residual format characters are removed; in the text segmentation stage, the text is divided into a line-by-line sequence according to the line break or punctuation rules; then, text structure feature extraction is carried out. On the one hand, text semantic features are extracted, such as capturing the semantic meaning of keywords like "Project Overview" and "Technical Solutions" through word segmentation or character sequences. On the other hand, text position features are extracted, including line number positions and indentation levels inferred by counting leading spaces or tab characters. This step makes the original unordered text initially possess "modelable" structural information.

[0026] Construct a corresponding structural feature vector for each line of text. Specifically, each line of text has both character-level representation and word-level representation. The character embedding sublayer maps each character to a 128-dimensional vector to capture fine-grained text patterns, such as numbering features like "1." and "(I)"; the word embedding sublayer maps the segmented words to 256-dimensional vectors to express semantic information, for example, to distinguish between "Project Background" and "Technical Solutions"; the position feature sublayer encodes the line number and indentation level, enabling the model to perceive the context order and potential hierarchical relationship. After the fusion of the three types of features, a 512-dimensional input vector is formed, providing a unified input for the subsequent deep learning model.

[0027] The input vector sequence is input into the encoding layer, that is, a bidirectional LSTM structure. The forward LSTM reads the directory text from top to bottom to capture the semantic rule of "gradual expansion of chapters"; the backward LSTM models in reverse from bottom to top to supplement the influence of the subsequent text on the current line. For example, the meaning of some titles needs to be fully understood in combination with subsequent sub-items. The outputs of the two are concatenated to form a 512-dimensional context feature vector, making each line not only contain its own information but also integrate context semantics and structural information. This is particularly crucial for dealing with the situation of "semantically similar but hierarchically different" in the directory.

[0028] In the decoding stage, a linear-chain CRF structure is introduced to globally optimize the label sequence through the emission matrix and the transition matrix, and the Viterbi algorithm is used to solve the optimal label path. Here, the BIOES annotation method is adopted for the label system to finely annotate each line of text. For example, "I. Project Overview" may be annotated as S-C (independent chapter), while "(I) Project Background" is annotated as S-S (independent section), and "1. Construction Necessity" under it is annotated as S-A (independent article). For cases across multiple lines, such as when a title is split by a line break, combined labels like B-T, I-T, E-T may occur. In this way, not only can it be identified which level each line belongs to, but also its position in the node can be recognized.

[0029] To ensure the structural legality of the annotation results, a series of label transition constraint rules are introduced in the CRF layer. For example, after B-C, only I-C or E-C or S-C can follow, and it cannot directly jump to the article-level label; after I-A, only I-A can continue or end with E-A to avoid incorrect truncation of articles. In addition, a hierarchical depth constraint is introduced. For example, after the chapter level, the paragraph level cannot directly appear and must go through the section or article level for transition. These constraints effectively avoid the problem of incorrect hierarchical judgment due to semantic similarity in actual scenarios, making the model output more in line with the real directory logic.

[0030] After completing the BIOES annotation, the second-level node information is further extracted based on the label sequence. The specific approach is to identify the node boundaries according to the B / I / E / S label combinations. For example, from B-S to E-S forms a complete "section" node, thereby extracting the node text, node type, start and end line numbers, and corresponding structural features. At the same time, the second-level relationship information is constructed according to the appearance order and hierarchical type of the nodes. For example, by comparing the priority of node types and the indentation or numbering rules, it is determined which "section" a certain "article" node belongs to, and then the parent-child relationship is established. In the above example, "Construction Necessity" is identified as an article-level node, its parent node is "Project Background", and the parent node of "Project Background" is "Project Overview", finally forming a complete tree structure.

[0031] Through the above process, the original chaotic second-level directory text is transformed into second-level directory information with clear structure, including the node set and the hierarchical relationship set. In real business, this result can be directly used for subsequent directory alignment, difference detection, and automatic correction. For example, when it is necessary to compare this manual directory with the AI-generated directory, it already has a standardized structure and no longer requires manual parsing, significantly reducing the labor cost. At the same time, since the model has learned a large amount of directory data in different formats during the training process, it has good generalization ability for various numbering methods, such as "1.", "(I)", "①", and different layout styles, and can adapt to complex and changeable actual tender document scenarios.

[0032] The embodiments of the present invention can effectively solve the problems of inconsistent directory formats and difficulty in identifying hierarchical levels. Through multi-feature fusion, context modeling and structural constraint decoding, it can achieve automatic conversion from unstructured text to structured directory, providing a high-quality foundation for subsequent directory verification and optimization, and significantly improving the efficiency and accuracy of bid document preparation.

[0033] Preferably, hierarchical node features are extracted based on the first directory information and the second directory information, and a joint matching alignment mechanism is constructed through the node features to output directory matching pairs. The directory matching pairs include first-level directory matching pairs and second-level directory matching pairs. The joint matching alignment mechanism is as follows: a first-level directory matching pair is obtained by performing a first-class matching alignment based on the hierarchical node features, and a second-level directory matching pair is output by performing path consistency verification based on the first-level directory matching pair.

[0034] Preferably, hierarchical node features are extracted based on the first directory information and the second directory information, and a joint matching alignment mechanism is constructed using the node features to output directory matching pairs. The specific steps are as follows: Extract the first-level node information based on the first directory information, and construct the first-level node set. ,in, This refers to the first first-level node information in the first directory information. This is the second first-level node information in the first directory information. The first directory information contains the i-th first-level node information; the second directory information contains the second-level node information. Construct a second-level node set, where, This refers to the first second-level node information in the second directory. This is the second second-level node information in the second directory information. This refers to the j-th second-level node information in the second directory information; each level node information includes node text, parent node information, path information, and node order index; Based on the first-level node information and the second-level node information, hierarchical node features are extracted respectively. The hierarchical node features include first-level node features and second-level node information features. The first-level node features include first semantic node sub-features, first structural path sub-features, and first-level structural sub-features. The second-level node features include second semantic node sub-features, second structural path sub-features, and second-level structural sub-features. Based on the first-level node features and the second-level node information features, a first-class matching alignment is performed, and the similarity of node sub-features is calculated separately. The comprehensive matching similarity is calculated by comprehensively processing the node sub-feature similarities, and a node feature similarity matrix is ​​obtained. The node sub-feature similarity includes the first sub-feature similarity, the second sub-feature similarity, and the third sub-feature similarity. Based on the node feature similarity matrix, the Hungarian algorithm is used to solve the optimal bipartite graph matching and obtain the first-level directory matching pairs; Based on the first-level directory matching pairs, the path consistency of the first-level directory information and the second-level directory information is checked, and the second-level directory matching pairs are output. The path consistency check is as follows: if the parent node of any level node information in the first-level directory matching pair is not aligned, the alignment result of the first-level directory matching pair is invalid. The first-level directory matching pairs with invalid alignment results are removed, and the remaining first-level directory matching pairs are selected as second-level directory matching pairs.

[0035] It should also be noted that the formula for calculating the similarity of node sub-features is as follows: ; In the formula: The similarity between the first sub-feature of the first semantic node and the first sub-feature of the second semantic node. For the first semantic node sub-feature corresponding to the i-th first-level node information, The second semantic node sub-feature corresponding to the j-th second-level node information; ; In the formula: The similarity is calculated as the second sub-feature of the first structural path sub-feature and the second structural path sub-feature. The first structural path sub-feature corresponding to the information of the i-th first-level node. The second structural path sub-feature corresponding to the j-th second-level node information; ; In the formula: The similarity is calculated between the third sub-feature of the first-level structural sub-feature and the third sub-feature of the second-level structural sub-feature. This refers to the first-level structural sub-feature corresponding to the information of the i-th first-level node. This refers to the second-level structural sub-feature corresponding to the j-th second-level node information.

[0036] The overall matching similarity is calculated by comprehensively processing the node sub-feature similarity, resulting in a node feature similarity matrix. The formula for calculating the overall matching similarity is as follows: ; In the formula: Let be the comprehensive matching similarity between the information of the i-th first-level node and the information of the j-th second-level node. The weighting coefficient for the similarity of the first sub-feature. The weighting coefficient of the similarity of the second sub-feature. This is the weighting coefficient for the similarity of the third sub-feature.

[0037] In addition, the specific steps for extracting hierarchical node features are as follows: Semantic node sub-features: A pre-trained Chinese pre-trained language model (BERT-wwm-ext) is used. The hierarchical node information is input into the model, and the output of the special marker (CLS) position in BERT-wwm-ext is taken as a 768-dimensional semantic vector. Structural path sub-features: Encode the complete path of a node in the hierarchical node information into a 128-dimensional vector; Hierarchical structure sub-features: The hierarchical type of a node is represented by one-hot encoding.

[0038] In one embodiment of the present invention, the catalog of tender documents often originates from complex sources. On the one hand, there is first catalog information automatically generated by the system based on the tender document template; on the other hand, there is second catalog information formed by bidders based on historical projects or manually compiled. Since the two types of catalogs differ in their expression, hierarchical division, sequential arrangement, and textual description, achieving automatic alignment between them becomes a key issue in improving the efficiency of tender document preparation. This embodiment of the present invention constructs a joint matching alignment mechanism based on hierarchical node features to achieve two-stage catalog alignment from coarse-grained to fine-grained, demonstrating significant effectiveness in practical applications.

[0039] Specifically, the process begins by acquiring information about the first directory, such as a clearly structured tree-like directory in a standard template, including sections like "Chapter 1 Project Overview—1.1 Project Background—1.1.1 Necessity of Construction—1.1.2 Construction Objectives—Chapter 2 Technical Solutions." Simultaneously, information about the second directory is acquired, such as text copied by the user from a historical project, such as "Project Overview—Background Description—Necessity Analysis—Objective Setting—Technical Design Scheme." While these directories are semantically corresponding, their hierarchical identifiers are not standardized, numbering is missing, or the order is different. Therefore, a node-level mapping relationship needs to be established between these two types of directories to facilitate subsequent difference detection and automatic correction.

[0040] By parsing the information in the first and second directories respectively, hierarchical node information is extracted, constructing a first-level node set and a second-level node set. For the first directory, a first-level node set is obtained, where each node contains node text, parent node information, path information (e.g., "Chapter 1 → 1.1 → 1.1.1"), and a node sequence index indicating its position within the same level. Similarly, a second-level node set is extracted from the second directory, and each node is assigned an information description with the same structure. This unifies the two directories, which originally had inconsistent formats, into a structured node set, providing a foundation for subsequent feature modeling.

[0041] After the node set is constructed, hierarchical node features are further extracted. For each node in the first directory, first-level node features are extracted, including first semantic node sub-features, first structural path sub-features, and first-level structural sub-features. For each node in the second directory, second-level node features are extracted, including second semantic node sub-features, second structural path sub-features, and second-level structural sub-features. In practice, semantic node sub-features are obtained through a pre-trained language model. For example, the node text is input into the BERT-wwm-ext model, and the output of the CLS position is extracted as a 768-dimensional semantic vector. This vector can effectively represent the semantic content of the node. For example, "project background" and "background description" have high similarity in semantic space. Structural path sub-features are implemented by encoding the path of the node in the directory tree. For example, "Chapter 1 → 1.1 → 1.1.1" is converted into a fixed-length path vector, thus reflecting the positional relationship of the node in the overall structure. Hierarchical structural sub-features use one-hot encoding to represent node types, such as chapter, section, and item, thus providing hierarchical category information.

[0042] After feature extraction, the first stage of the joint matching alignment mechanism, namely, first-class matching alignment, is entered. Specifically, for any pair of nodes in the first and second node sets, three types of sub-feature similarities are calculated. The first sub-feature similarity is obtained by calculating the cosine similarity of the structural path sub-features, which measures the consistency of the two nodes in the structural path; the second sub-feature similarity is obtained by calculating the cosine similarity of the semantic node sub-features, which measures the similarity of the node text semantics; and the third sub-feature similarity is obtained by calculating the cosine similarity of the hierarchical structure sub-features, which measures the consistency of the node hierarchical type. Subsequently, a comprehensive matching similarity is calculated through a weighted fusion method. The weight coefficients are adjusted based on experience or training data in practical applications. For example, in scenarios with large semantic differences, the structural path weight is increased to enhance matching stability.

[0043] Based on the above comprehensive matching similarity, a node feature similarity matrix is ​​constructed, with dimensions of [dimensions to be filled in]. Each element in the matrix represents the degree of matching between a node in the first directory and a node in the second directory. Based on this, the Hungarian algorithm is used to solve the matrix, transforming the problem into an optimal bipartite graph matching problem, that is, maximizing the overall matching similarity while ensuring that each node matches at most one other node. Through this process, first-level directory matching pairs are obtained, which is a set of globally optimal node correspondences. For example, "Project Overview" might match "Project Summary," and "Technical Solution" might match "Technical Design Solution," thus achieving preliminary semantic alignment.

[0044] However, first-level matching results based solely on similarity may still suffer from structural inconsistencies. For example, in some cases, "construction necessity" might be incorrectly matched to "technical route description," even though they share some textual similarity, their positions in the directory structure are completely different. Therefore, this embodiment of the invention further introduces a path consistency verification mechanism to filter and correct first-level directory matching pairs. In this process, for each pair of matching nodes, it checks whether its parent node also has a corresponding relationship in the matching set, implementing a "parent node dependency constraint" for child node matching, effectively avoiding cross-level erroneous matching.

[0045] After completing the path consistency check, matching pairs that satisfy the parent node alignment constraints are retained to form second-level directory matching pairs. Compared with first-level matching, second-level matching not only maintains semantic consistency but also structurally satisfies the directory hierarchy, thus better reflecting real business needs. For example, in the example above, the match between "Project Background" and "Background Description" is only retained if its parent nodes "Project Overview" and "Project Summary" match; otherwise, it will be discarded, thus ensuring the consistency of the overall directory structure.

[0046] This invention effectively achieves automatic alignment between complex directories. On one hand, the first-level matching based on multi-feature fusion and the Hungarian algorithm ensures globally optimal semantic alignment; on the other hand, the second-level matching based on path consistency further strengthens structural constraints and significantly improves matching accuracy. In practical applications, directory matching, which originally required manual comparison of each directory, can be completed within seconds. It also exhibits good adaptability to complex hierarchical directories, providing a reliable foundation for subsequent difference detection, directory correction, and automatic generation of tender documents.

[0047] Preferably, a multi-category anomaly classification and detection strategy is executed based on the second-level directory matching pair to determine the category directory anomaly type. The specific steps are as follows: Multi-category anomaly classification and detection strategies include hierarchical mismatch detection, node missing detection, node redundancy detection, sequence anomaly detection, and content difference detection; The specific method for detecting hierarchical mismatch is as follows: by comparing the hierarchical depth of the corresponding nodes in the first directory information and the second directory information, if the absolute value of the difference in hierarchical depth is greater than or equal to 1, it is determined to be a hierarchical mismatch. The node missing detection is as follows: traverse the nodes in the second directory information. If the node has no corresponding node in the first directory information and the overall matching similarity is less than 0.6, it is determined to be a missing node. The node redundancy detection is as follows: traverse the nodes in the first directory information. If the node has no corresponding node in the second directory information and the overall matching similarity is less than 0.6, it is determined to be a node redundancy. The sequence anomaly detection is as follows: compare the order of corresponding nodes in the first directory information and the second directory information. If the order is inconsistent, it is determined to be a sequence anomaly. The content difference detection specifically involves calculating the edit distance or semantic similarity of the text in the first and second directory information. If the edit distance is greater than 3 or the semantic similarity is less than 0.85, it is determined to be a content difference.

[0048] Preferably, the need for manual intervention is assessed by statistically analyzing the number of anomalies in multiple categories of directories. The specific steps are as follows: By counting the number of abnormal differences in multiple categories, if the number of abnormal differences in multiple categories is greater than or equal to the abnormal number threshold, manual intervention is required; if the number of abnormal differences in multiple categories is less than the abnormal number threshold, manual intervention is not required.

[0049] In one embodiment of the present invention, in order to achieve automatic directory verification and intelligent error correction, after obtaining the matching pairs of the second-level directories, the embodiment of the present invention introduces a multi-category anomaly classification and detection strategy to systematically identify the differences between directories and further determine whether manual intervention is required, thereby significantly reducing the cost of manual proofreading in real applications.

[0050] When the first directory information generated using the standard template includes a structure such as "Chapter 1 Project Overview—1.1 Project Background—1.1.1 Necessity of Construction—1.1.2 Construction Objectives—Chapter 2 Technical Solution—2.1 System Architecture—2.2 Technical Route," while the second directory information compiled by the bidding personnel based on historical projects is "Project Overview—Background Description—Necessity Analysis—Objective Setting—Technical Design—System Structure—Implementation Path Description," the aforementioned matching and alignment mechanism has established the correspondence between most nodes. For example, "Project Overview" corresponds to "Project Overview," and "Project Background" corresponds to "Background Description." Based on this, a multi-category anomaly classification and detection strategy is further implemented.

[0051] By comparing the hierarchical depth of matching nodes in two directories—for example, if "Construction Necessity" is a third-level node in the first directory but might be presented as a second-level node in the second—the difference in hierarchical depth is 1, satisfying the condition that the absolute value is greater than or equal to 1, thus indicating a hierarchical mismatch. In actual business operations, this type of problem usually stems from omitting intermediate levels or merging structures during manual organization. If left unaddressed, it can lead to an irregular directory structure and even affect the understanding of the evaluation logic by the evaluators.

[0052] By traversing all nodes in the second directory information, for example, if it is found that "Implementation Path Description" has no corresponding node in the first directory, and the overall similarity between this node and any node in the first directory is less than 0.6, then this node is determined to be "missing node". In real-world scenarios, this often indicates that the tender document has added content not required in the template, or that different wording was used, resulting in a mismatch. This can be marked as an anomaly for subsequent judgment on whether it should be retained or deleted.

[0053] The nodes in the first directory are traversed in reverse order. For example, if a "Technical Route" node exists in the first directory but no corresponding node is found in the second directory, and the similarity is less than 0.6, it is judged as a "redundant node". This situation is quite common in practice and usually means that the tender document has omitted a key section required by the tender document. This is a serious problem that may directly affect the compliance of the tender, so it needs to be marked.

[0054] After verifying node existence, a sequence anomaly detection is performed. For matched node pairs, their order indices in their respective directories are compared. For example, if "System Architecture" precedes "Technical Route" in the first directory, but "Implementation Path Description" precedes "System Structure" in the second directory, resulting in an inconsistent node order, this is considered a sequence anomaly. In real-world applications, while incorrect directory order may not necessarily affect content completeness, it reduces document standardization and can even lead to point deductions in certain rigorous evaluation scenarios, thus requiring adjustment.

[0055] For matching nodes, calculate the text edit distance or semantic similarity. For example, "construction goals" and "goal setting" are semantically close, and their semantic similarity may be higher than 0.85, so they will not be judged as abnormal. However, "technical route" and "implementation path description," although containing some similar words, have significant semantic differences. If the semantic similarity is lower than 0.85 or the edit distance is greater than 3, it is judged as a content difference. In actual business, such differences usually reflect inconsistent expression or content deviation from template requirements, and it is necessary to decide whether to make uniform corrections based on the specific scenario.

[0056] The five types of anomaly detection described above provide a comprehensive picture of directory discrepancies. In a typical project, the detection results might include: 2 instances of hierarchical mismatch, 1 instance of missing node, 2 instances of redundant nodes, 1 instance of sequence anomaly, and 3 instances of content discrepancies. To further determine the appropriate handling method, an anomaly count mechanism is introduced. Specifically, all anomalies are counted and compared with a preset anomaly count threshold. For example, if the threshold is set to 5, a total of 9 detected anomalies indicates a serious directory discrepancy requiring manual intervention; conversely, a total of 3 anomalies indicates a minor discrepancy requiring no manual intervention. In large infrastructure projects with high requirements for directory standardization, a lower threshold can be set to increase the proportion of manual review; while in internal reviews or rapid bidding scenarios, the threshold can be appropriately increased to improve the efficiency of automated processing. Furthermore, a comprehensive judgment can be made by combining the weights of anomaly types, for example, assigning higher weights to "redundant nodes" and "missing nodes," while assigning lower weights to "sequence anomalies" and "content discrepancies," thereby achieving more refined decision-making.

[0057] This invention, through a multi-category anomaly classification and detection strategy based on second-level directory matching pairs, can comprehensively identify structural and content problems in the directory. Combined with an anomaly quantity assessment mechanism, it determines whether manual intervention is required. This not only improves the automation level of directory verification but also effectively reduces the cost of manual review. At the same time, it ensures the standardization and consistency of tender documents, and has significant engineering application value.

[0058] Preferably, the second directory information is merged and corrected based on multiple categories of directory anomaly types to output the final directory information. The specific steps are as follows: Based on the preset difference merging rules, the merged directory information is generated and displayed to the user in an editable format, along with a difference detection report. The preset difference merging rules are as follows: for mismatched levels, the level of the first directory information is used; for missing or redundant nodes, the nodes are merged using the first and second directory information. Receive user modifications to directory information, including node level adjustment, node deletion, node addition, and node order adjustment; update the final directory based on user modifications and record user feedback data. User feedback data is added to the training dataset for iterative optimization of the deep learning model.

[0059] In one embodiment of the present invention, first directory information generated from a standard template and second directory information obtained through manual collation have been obtained. Multiple anomaly types have been identified through the aforementioned steps, including hierarchical mismatch, missing nodes, redundant nodes, sequence anomalies, and content differences. At this point, a preliminary merged directory result is generated based on preset difference merging rules. For hierarchical mismatch issues, the hierarchical structure in the first directory information is directly used as the standard. For example, if "Construction Necessity" is located below "Project Background" in the first directory, but this node is incorrectly promoted to the same level as "Project Background" in the second directory, it is automatically adjusted to a three-level structure, and its child node levels are adjusted simultaneously. This processing method can effectively avoid hierarchical chaos caused by manual collation in practice, ensuring that the directory conforms to the structural specifications of the tender documents.

[0060] To address the issues of missing and redundant nodes, a union strategy is employed. Specifically, if the second directory lacks certain nodes required by the first directory, these nodes are extracted from the first directory and inserted into their corresponding positions. Nodes present in the second directory but not included in the first directory are not directly deleted; instead, they are merged with nodes from the first directory and retained. This "union strategy" is particularly important in real-world scenarios because tender documents often need to reflect the company's unique characteristics while meeting bidding requirements. Completely deleting extra nodes could lead to information loss; therefore, the merging method ensures both standardization and flexibility.

[0061] After automatic merging, the generated directory information is displayed to the user in an editable format, such as a tree structure or a document outline, with each node supporting expansion, collapse, and editing. Simultaneously, a discrepancy detection report is displayed, detailing each type of anomaly, including mismatched nodes, missing nodes, redundant nodes, and differences in order and content, presented visually in the directory through highlighting or marking. For example, nodes with adjusted hierarchies might be marked in blue, newly added nodes in green, and potentially deleted nodes in red, helping users quickly understand the changes made.

[0062] In practice, users can further adjust the directory structure based on this visual interface. User modifications mainly include adjusting node hierarchy, deleting nodes, adding nodes, and changing node order. For example, in some cases, a user might consider the "Implementation Experience Summary" section, although not in the bidding template, to have significant display value, and therefore manually retain it and move it to the "Appendix" section; or a user might find the automatically inserted "Technical Route" node's position unreasonable and drag it to a more suitable location; or a user might believe that certain node descriptions are inaccurate and directly modify the text content. The backend responds to these operations in real time and dynamically updates the directory structure, including parent-child relationships, path information, and node order indexes, ensuring that the final directory always maintains structural consistency.

[0063] More importantly, every modification made by the user is recorded, forming user feedback data. This data includes not only changes in the node structure before and after the modification, but also the user's preference for handling certain types of differences. For example, if a user chooses to keep "redundant nodes" in multiple projects, their preference can be learned; or if a user frequently adjusts a certain type of hierarchical structure, it indicates that the current automatic rules still have room for optimization in that scenario. This feedback data is stored uniformly and serves as an important data source for subsequent model training.

[0064] During the model optimization phase, user feedback data is integrated with the original training dataset for iterative training of the deep learning model. For example, in the hierarchical recognition model, user-corrected real hierarchical labels can be introduced as supervision signals to improve the model's ability to recognize complex directory structures; in the node matching model, user-confirmed matching relationships can be used as positive samples to improve the accuracy of similarity calculation; and in the differential classification model, thresholds or weight parameters can be dynamically adjusted based on whether users have modified a certain type of anomaly. Through this continuous learning mechanism, it is possible to gradually adapt to the usage habits of different domains and users, realizing the transformation from "general rule-driven" to "personalized intelligent optimization".

[0065] In the traditional model, a tender catalog containing hundreds of nodes often requires manual verification item by item, taking hours or even longer. However, the embodiments of this invention can complete automatic merging and preliminary correction within seconds. Users only need to adjust a few key nodes to complete the final confirmation, significantly reducing workload. Furthermore, by continuously learning from user feedback, its automatic correction capability will continuously improve with increased usage, further enhancing the processing efficiency of subsequent projects.

[0066] This invention, through a difference merging and correction mechanism based on multiple categories of directory anomalies, combined with visual editing and user feedback learning, achieves a closed-loop process from "automatic generation" to "intelligent optimization" of the directory. This not only ensures the standardization and integrity of the directory structure but also takes into account the flexible needs of actual business, and has significant engineering application value and promotion significance.

[0067] Example 2: In one embodiment of the present invention, a deep learning-based bid document directory hierarchy recognition and verification method is applied in real bidding processes. The bidding party typically provides a standardized bid document, which clearly specifies the directory structure of the bid document. First, the bid document template is parsed to obtain the first directory information. This process includes identifying and structuring the directory chapters in the template, extracting first-level text information, such as "Chapter 1 Project Overview" and "Chapter 2 Technical Solution," etc. Simultaneously, the first-level node information and first-level relationship information are further parsed to clarify the parent-child relationships and path structure between nodes. For example, "Project Background" is a child node of "Project Overview," and "Construction Necessity" is a lower-level node of "Project Background," thus forming a complete directory tree structure, providing a benchmark for subsequent alignment.

[0068] Meanwhile, bidders typically import an existing bid document table of contents as a secondary directory. These directories often originate from historical projects or are manually compiled, resulting in inconsistent formats, unclear hierarchies, and inconsistent numbering. For example, the imported directory might be formatted as "Project Overview—Background Description—Necessity Analysis—Goal Setting—Technical Design—System Structure," lacking clear hierarchical identifiers. After obtaining this secondary directory text information, the process proceeds to the deep learning-driven hierarchical recognition stage.

[0069] First, the text in the second directory undergoes structured preprocessing, including text cleaning, text segmentation, and feature extraction. Then, each line of text is input into a deep learning model for sequence labeling. Using the BIOES annotation system, the hierarchical type of each line of text within the directory, such as chapter, section, or entry, is identified, and its position within the nodes is further determined. Based on this, second-level node information is extracted, including node text, node type, and node position. Simultaneously, second-level relationship information is constructed by combining contextual relationships, thus forming a complete second directory information structure.

[0070] After obtaining two sets of structured directories, node feature extraction and matching alignment are further performed. For each node in the first and second directories, semantic features, structural path features, and hierarchical structure features are extracted. Semantic features are obtained through a pre-trained language model to represent the textual meaning of the node; structural path features represent the node's positional path in the directory tree; and hierarchical structure features identify the node's hierarchical type. Based on these features, a node similarity matrix is ​​constructed, and the Hungarian algorithm is used to solve for the optimal bipartite graph matching, thereby obtaining first-level directory matching pairs and achieving the globally optimal node correspondence.

[0071] However, relying solely on similarity matching may lead to structural inconsistencies. Therefore, a path consistency check is further performed based on the first-level matching results to generate second-level directory matching pairs. Specifically, it checks whether the parent node of each pair of matching nodes also has a corresponding relationship. If the parent node is not aligned, the matching of that child node is deemed invalid and discarded. This constraint mechanism ensures that the final matching results are not only semantically consistent but also structurally reasonable, thus yielding high-quality directory matching relationships.

[0072] After directory alignment is completed, the difference detection phase begins. Based on the second-level directory matching pairs, a multi-category anomaly classification and detection strategy is executed, including hierarchical mismatch detection, node missing detection, node redundancy detection, sequence anomaly detection, and content difference detection. For example, hierarchical mismatch is determined by comparing node hierarchy depth, node missing or redundancy is determined by matching results and similarity thresholds, sequence anomalies are determined by sequential indexing, and content differences are determined by edit distance or semantic similarity. All anomalies are statistically analyzed, and the severity of the differences is assessed based on preset thresholds. When the number of anomalies is small, the system can automatically correct them; when the number of anomalies is large or involves critical nodes, manual intervention is required.

[0073] Based on the discrepancy analysis, a discrepancy merging and correction operation is performed. For hierarchical mismatch issues, the standard hierarchical structure in the first directory is used for adjustment first. For missing or redundant nodes, a union-set merging strategy is adopted to supplement necessary nodes from the first directory to the second directory, while retaining business-valued extended nodes in the second directory. For order anomalies, the order is rearranged according to the first directory. For content discrepancies, replacement or merging is performed based on semantic consistency. After automatic correction, the merged directory information is generated and displayed to the user in a visual and editable format, along with a detailed discrepancy detection report.

[0074] In practical use, users can further adjust the generated directory information, such as modifying node levels, deleting redundant nodes, adding missing content, or adjusting the order. The directory structure is updated in real time, and all user actions are recorded to form user feedback data. This feedback data will be incorporated into the training dataset for subsequent iterative optimization of the deep learning model. For example, by learning how users adjust certain types of nodes, the model can automatically make judgments that better align with user habits in similar scenarios in the future, thereby continuously improving its intelligence level.

[0075] This invention combines deep learning with structured modeling to achieve fully intelligent processing of the entire catalog process, from template parsing to automatic correction. Compared with traditional manual methods, it not only significantly improves efficiency but also enhances the standardization and consistency of the catalog. Furthermore, user feedback drives continuous model optimization, enabling the system to evolve over the long term and demonstrating significant engineering application value.

[0076] Example 3: A deep learning-based bid document directory hierarchy recognition and verification system, including a directory information acquisition module, a hierarchy recognition module, a joint matching and alignment module, an anomaly classification and detection module, and a directory information merging and correction module; the directory information acquisition module is connected to the hierarchy recognition module, the hierarchy recognition module is connected to the joint matching and alignment module, the joint matching and alignment module is connected to the anomaly classification and detection module, and the anomaly classification and detection module is connected to the directory information merging and correction module. The directory information acquisition module is used to obtain the first directory information based on the bidding document template. The first directory information includes the first-level text information, the first-level node information, and the first-level relationship information. The second directory text information is also obtained through user import. The hierarchical recognition module is used to perform hierarchical recognition on the text information of the second directory based on a deep learning model, and to obtain the second-level node information and the second-level relationship information to form the second directory information. The joint matching and alignment module is used to extract hierarchical node features based on the first directory information and the second directory information, construct a joint matching and alignment mechanism through the node features, and output directory matching pairs. The anomaly classification and detection module is used to execute multi-category anomaly classification and detection strategies based on second-level directory matching pairs, determine the anomaly types of the category directories, and assess whether manual intervention is required by statistically analyzing the number of anomalies in the multi-category directories. The directory information merging and correction module is used to merge and correct differences in the second directory information based on multiple categories of directory anomaly types, and output the final directory information.

[0077] like Figure 2 The diagram shown is a system block diagram of a deep learning-based bid document directory hierarchy recognition and verification system according to an embodiment of the present invention, which can be used to execute... Figure 1The steps in the method embodiments shown are implemented in a similar manner and have similar technical effects, and will not be repeated here.

[0078] Through the above embodiments, this invention automatically identifies directory levels using a deep learning model, rapidly converting unstructured text into a structured directory, significantly reducing manual parsing costs and improving processing efficiency. By integrating semantic, structural path, and hierarchical information into a joint matching mechanism, combined with path consistency constraints, it effectively avoids mismatches caused by relying solely on text similarity, significantly improving the accuracy and stability of directory alignment. A multi-category anomaly classification and detection mechanism comprehensively identifies structural and content issues in the directory, and quantitative assessment determines whether manual intervention is necessary, achieving a reasonable division of labor between automation and manual review. A difference merging and correction strategy ensures directory standardization while considering practical business flexibility, filling in missing content while retaining effective extended information. By introducing user feedback data for model iterative optimization, the system possesses continuous learning capabilities, constantly adapting to different project scenarios and user habits, thereby continuously improving performance and intelligence over long-term use. This invention not only improves the efficiency of bidding document preparation but also enhances the consistency and standardization of directory structures.

[0079] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application.

[0080] Finally: The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for verifying the identification of a bid document directory hierarchy based on deep learning, characterized in that, Includes the following steps: The first directory information is obtained by parsing the tender document template. The first directory information includes first-level text information, first-level node information and first-level relationship information. The second directory text information is obtained by user import. Based on a deep learning model, the text information of the second directory is hierarchically identified, and the second-level node information and the second-level relationship information are obtained respectively to form the second directory information; Based on the information from the first directory and the information from the second directory, extract the hierarchical node features respectively, construct a joint matching and alignment mechanism through the node features, and output the directory matching pairs; Based on the second-level directory matching pair, a multi-category anomaly classification and detection strategy is executed to determine the category directory anomaly type, and the need for manual resolution is assessed by counting the number of multi-category directory anomalies. Based on the multiple categories of directory anomaly types, the differences in the second directory information are merged and corrected to output the final directory information.

2. The method for identifying and verifying the directory hierarchy of tender documents based on deep learning according to claim 1, characterized in that, The second directory text information is hierarchically identified using a deep learning model. Second-level node information and second-level relationship information are obtained to construct the second directory information. The specific steps are as follows: The text information of the second directory is obtained and subjected to structured preprocessing to obtain text structure features; the text structure features include text semantic features and text position features; For the text information in the second directory, extract the text structure features of each line, input the text structure features into a deep learning model to perform hierarchical recognition of the text information in the second directory, and use BIOES to perform hierarchical annotation of the text information in the second directory. Extracting second-level node relationships and second-level relationship information based on hierarchical annotation.

3. The deep learning-based method for identifying and verifying the directory hierarchy of tender documents according to claim 1, characterized in that, Based on the information of the first directory and the information of the second directory, the hierarchical node features are extracted respectively. A joint matching and alignment mechanism is constructed through the node features to output directory matching pairs. The directory matching pairs include first-level directory matching pairs and second-level directory matching pairs. The joint matching and alignment mechanism is as follows: a first-class matching and alignment is performed based on the hierarchical node features to obtain first-level directory matching pairs. Path consistency verification is performed based on the first-level directory matching pairs to output second-level directory matching pairs.

4. The deep learning-based method for identifying and verifying the directory hierarchy of tender documents according to claim 3, characterized in that, Based on the information from the first directory and the second directory, hierarchical node features are extracted respectively. A joint matching and alignment mechanism is constructed using the node features to output directory matching pairs. The specific steps are as follows: First-level node information is extracted from the first directory information to construct a first-level node set; second-level node information is extracted from the second directory information to construct a second-level node set. Based on the first-level node information and the second-level node information, hierarchical node features are extracted respectively. The hierarchical node features include the first-level node features and the second-level node information features. Based on the first-level node features and the second-level node information features, a first-class matching alignment is performed, and the similarity of node sub-features is calculated respectively. The comprehensive matching similarity is calculated by comprehensively processing the node sub-feature similarity, and the node feature similarity matrix is ​​obtained. Based on the node feature similarity matrix, the Hungarian algorithm is used to solve the optimal bipartite graph matching and obtain the first-level directory matching pairs; Based on the first-level directory matching pairs, the path consistency of the first-level directory information and the second-level directory information is checked, and the second-level directory matching pairs are output.

5. The deep learning-based method for identifying and verifying the directory hierarchy of tender documents according to claim 3, characterized in that, The first-level node features include the first semantic node sub-features, the first structural path sub-features, and the first-level structural sub-features; The second-level node features include the second semantic node sub-features, the second structural path sub-features, and the second-level structural sub-features; the node sub-feature similarity includes the first sub-feature similarity, the second sub-feature similarity, and the third sub-feature similarity.

6. The deep learning-based method for identifying and verifying the directory hierarchy of tender documents according to claim 5, characterized in that, The path consistency check is as follows: if the parent node of any level node information in the first-level directory matching pair is not aligned, the alignment result of the first-level directory matching pair is invalid. The first-level directory matching pairs with invalid alignment results are removed, and the remaining first-level directory matching pairs are selected as second-level directory matching pairs.

7. The deep learning-based method for identifying and verifying the directory hierarchy of tender documents according to claim 3, characterized in that, Based on the second-level directory matching pair, a multi-category anomaly classification and detection strategy is executed to determine the category directory anomaly type. The multi-category anomaly classification and detection strategy includes hierarchical mismatch detection, node missing detection, node redundancy detection, sequence anomaly detection, and content difference detection.

8. The deep learning-based method for identifying and verifying the directory hierarchy of tender documents according to claim 7, characterized in that, To assess whether manual intervention is needed, the number of anomalies in multiple categories of directories is counted. The specific steps are as follows: By counting the number of abnormal differences in multiple categories, if the number of abnormal differences in multiple categories is greater than or equal to the abnormal number threshold, manual intervention is required; if the number of abnormal differences in multiple categories is less than the abnormal number threshold, manual intervention is not required.

9. The deep learning-based method for identifying and verifying the directory hierarchy of tender documents according to claim 8, characterized in that, Based on multiple categories of directory anomalies, the second directory information is merged and corrected to output the final directory information. The specific steps are as follows: Based on the preset difference merging rules, the merged directory information is generated and displayed to the user in an editable format, along with a difference detection report. The preset difference merging rules are as follows: for mismatched levels, the level of the first directory information is used; for missing or redundant nodes, the nodes are merged using the first and second directory information. Receive user modification operations on directory information, update the final directory based on user modification operations, and record user feedback data; User feedback data is added to the training dataset for iterative optimization of the deep learning model.

10. A deep learning-based bid document directory hierarchy recognition and verification system, applied to the deep learning-based bid document directory hierarchy recognition and verification method as described in any one of claims 1-9, characterized in that, The system includes a directory information acquisition module, a hierarchy recognition module, a joint matching and alignment module, an anomaly classification and detection module, and a directory information merging and correction module. The directory information acquisition module is used to obtain the first directory information based on the bidding document template. The first directory information includes the first-level text information, the first-level node information, and the first-level relationship information. The second directory text information is also obtained through user import. The hierarchical recognition module is used to perform hierarchical recognition on the text information of the second directory based on a deep learning model, and to obtain the second-level node information and the second-level relationship information to form the second directory information. The joint matching and alignment module is used to extract hierarchical node features based on the first directory information and the second directory information, construct a joint matching and alignment mechanism through the node features, and output directory matching pairs. The anomaly classification and detection module is used to execute multi-category anomaly classification and detection strategies based on second-level directory matching pairs, determine the anomaly types of the category directories, and assess whether manual intervention is required by statistically analyzing the number of anomalies in the multi-category directories. The directory information merging and correction module is used to merge and correct differences in the second directory information based on multiple categories of directory anomaly types, and output the final directory information.