An audit literature mind map generation method and system based on feature refinement and large language model

By combining feature refinement with a large language model and graph neural networks, a structured audit mind map is generated, which solves the problem of logical gaps in audit documents in existing technologies and achieves efficient and accurate understanding and analysis of audit documents.

CN122114117BActive Publication Date: 2026-06-26SHENYUAN TECHNOLOGY (NANJING) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENYUAN TECHNOLOGY (NANJING) CO LTD
Filing Date
2026-04-30
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing large language models are unable to effectively reflect the professional attributes and knowledge organization logic of auditing academic literature, and the generated mind maps lack a structured presentation of key content in the auditing field.

Method used

By combining feature refinement with a large language model and graph neural network, the structured information of the target audit documents is extracted, keyword importance is calculated, hierarchical clustering and dynamic weighting are performed to generate a tree structure that conforms to the logic of the audit documents. Rule constraints and node pruning are then applied, and hybrid retrieval and parsing are performed in conjunction with the audit document knowledge base to finally generate a structured audit mind map.

Benefits of technology

It enhances the systematic and logical organization of audit literature, improves the efficiency of understanding and analysis, reduces the time cost for researchers and students to read and organize, and generates mind maps that are accurate and conform to audit field standards.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122114117B_ABST
    Figure CN122114117B_ABST
Patent Text Reader

Abstract

The application discloses an audit literature mind map generation method and system based on feature refining and a large language model, first extracts object theme information, text information, reference object information and hierarchical information in target audit literature and converts them into structured audit text information, and then adopts a hybrid weight model to calculate the global importance of keywords in the structured audit text information and extract target text keywords; the application realizes the function of efficiently generating corresponding audit mind maps for audit literature, and through the adoption of a graph neural network, the logical coherence of text object information can be enhanced and the logical relationship between different nodes can be ensured to be rigorous, and through rule constraints, the text information can be ensured to conform to the audit field specifications, which not only can enhance the systematicness and logicalness of knowledge organization, but also can finely control the generated content, and at the same time, can provide accurate audit literature reading assistance for users and greatly improve the understanding and analysis efficiency of audit literature.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of document mind map generation technology, specifically to a method and system for generating audit document mind maps based on feature refinement and a large language model. Background Technology

[0002] Academic literature in the field of auditing, especially journal articles, typically possesses strong theoretical depth and rigorous logical structure, encompassing a large amount of professional terminology, auditing standards, research hypotheses, and empirical analysis. These documents exhibit complex hierarchical relationships between research paradigms, argumentation paths, and conclusion derivations, placing high demands on readers' background knowledge and logical comprehension abilities. To effectively extract core viewpoints, clarify the argumentation framework, and construct a knowledge system, transforming the content of auditing literature into clearly structured mind maps has become an important means of improving literature reading efficiency and knowledge integration capabilities.

[0003] Currently, while existing large language models can be applied to automatic summarization and mind map generation of literature, their outputs are mostly based on general text understanding frameworks, and the key nodes of the generated mind maps are usually limited to general academic structures, such as general inductive dimensions like research background and motivation, research methods, main findings, and research contributions. Although this model is suitable for general social science or scientific papers, it is difficult to reflect the unique professional attributes and knowledge organization logic of auditing academic literature. For example, the content of auditing standard applicability analysis, risk-oriented audit path, internal control evaluation, fraud detection mechanism, audit evidence chain structure, independence requirements, and regulatory compliance points in auditing journal articles is often weakened, ignored, or not presented in a structured way in the mind maps generated by existing large models. Therefore, it is necessary to design a method and system for generating auditing literature mind maps based on feature refinement and large language models. Summary of the Invention

[0004] The purpose of this invention is to overcome the shortcomings of existing technologies and to better and more effectively address the problem that the output results of existing large language models are mostly based on general text understanding frameworks, making it difficult to reflect the unique professional attributes and knowledge organization logic of audit-related academic literature. This invention provides a method and system for generating audit literature mind maps based on feature refinement and large language models. It achieves the function of efficiently generating corresponding audit mind maps for audit literature. Furthermore, by employing graph neural networks, it enhances the logical coherence of textual information and ensures rigorous logical relationships between different nodes. Rule constraints ensure that the text information conforms to audit field standards. This not only enhances the systematicness and logic of knowledge organization but also allows for refined control over the generated content. Simultaneously, it provides users with precise audit literature guidance and significantly improves the efficiency of understanding and analyzing audit literature.

[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0006] A method for generating mind maps of audit documents based on feature refinement and large language models includes the following steps:

[0007] Step A: Extract the subject information, text information, reference information, and hierarchical information from the target audit document and convert them into structured audit text information;

[0008] Step B involves using a hybrid weighting model to calculate the global importance of keywords in the structured audit text information and extracting target text keywords.

[0009] Step C involves using hierarchical clustering and dynamic weighting to generate a tree-like structure of the target text that conforms to the logic of the target audit document;

[0010] Step D: Use graph neural networks to aggregate and optimize the semantic entities and logical relationships in the tree structure of the target text to obtain the target optimized topology, and then apply rules to the target optimized topology to obtain the target constrained text.

[0011] Step E: Calculate the probability distribution of child nodes in the target constraint text and obtain the child node probability distribution results. Then, based on the child node probability distribution results, set the confidence threshold and perform node pruning and hierarchical filtering on the scale of the target constraint text to obtain the target refined text.

[0012] Step F involves performing a hybrid search in the audit literature knowledge base based on the target refined text to obtain the logical framework of the relevant audit domain and the structure of the relationship between relevant standard entities;

[0013] Step G involves using a large language model to parse the target refined text, the logical framework of the related audit domain, and the content of the related standard entity relationship structure to obtain the mind map content information;

[0014] Step H: Input the mind map content information into the mind map building tool and generate the target audit document mind map.

[0015] The aforementioned method for generating audit document mind maps based on feature refinement and large language models includes step A, which involves extracting the subject information, main text information, reference information, and hierarchical information from the target audit document and converting them into structured audit text information. The target audit document is stored in PDF format, and the structured audit text information is stored in JSON format.

[0016] In the aforementioned method for generating audit document mind maps based on feature refinement and a large language model, step B involves using a hybrid weighting model to calculate the global importance of keywords in structured audit text information and extracting target text keywords. The global importance of these keywords is used to measure the semantic relevance between the keywords and the topic of the target audit document. Specifically, the hybrid weighting model uses TF-IDF weighting and Sentence-BERT similarity for calculation, as shown in formula (1).

[0017] (1)

[0018] in, Candidate keywords Its global importance For weight parameters, Candidate keywords The term frequency-inverse document frequency value, Candidate keywords Document topic vector Semantic similarity between them.

[0019] The aforementioned method for generating mind maps of audit documents based on feature refinement and large language models, step C, involves using hierarchical clustering and dynamic weighting of the target text keywords to generate a tree-like structure of the target text that conforms to the logic of the target audit document. The specific steps are as follows.

[0020] Step C1 involves performing hierarchical clustering of the target text keywords using Sentence-BERT embedding clustering, as shown in formula (2).

[0021] (2)

[0022] in, node vector and node vectors Similarity between them and All are node vectors corresponding to keywords in the target text;

[0023] Step C2 involves using a dynamic weighting algorithm to dynamically weight and evaluate the keywords in the target text, as shown in formula (3).

[0024] (3)

[0025] in, For the current node The dynamic comprehensive evaluation value, This represents the coverage of nodes corresponding to the generated text keywords. This is a dynamically weighted adjustment term based on semantic distance. This represents the semantic distance between the node corresponding to the current text keyword and the target node.

[0026] The aforementioned method for generating mind maps of audit documents based on feature refinement and a large language model, in step D, employs a graph neural network to aggregate and optimize the semantic entities and logical relationships in the tree structure of the target text to obtain the target optimized topology. Then, rule constraints are applied to the target optimized topology to obtain the target constrained text. The specific steps are as follows.

[0027] Step D1 involves using a graph neural network to aggregate and optimize the semantic entities and logical relationships in the tree structure of the target text to obtain the optimized topology of the target, as shown in formula (4).

[0028] (4)

[0029] in, and The first Layer and first The node feature matrix of the layer It is a non-linear activation function. For degree matrix, It is an adjacency matrix. For the first Layer weight matrix;

[0030] Step D2 involves applying rule constraints to the target optimization topology and obtaining the target constraint text. The specific steps are as follows:

[0031] Step D21, define the rule loss as shown in formula (5).

[0032] (5)

[0033] in, Loss due to rules, For a set of audit rules, For audit rule set Satisfaction level;

[0034] Step D22: Construct the total loss function for rule constraints based on the rule loss, as shown in formula (6).

[0035] (6)

[0036] in, The total loss is constrained by the rules. Loss due to natural language processing tasks For rule constraint weight parameters, Loss due to rules.

[0037] The aforementioned method for generating audit document mind maps based on feature refinement and a large language model, in step E, involves calculating the probability distribution of child nodes in the target constraint text and obtaining the child node probability distribution results. Then, based on the child node probability distribution results, a confidence threshold is set, and node pruning and hierarchical filtering are performed on the target constraint text to obtain the refined target text. The specific steps are as follows.

[0038] Step E1 involves calculating the probability distribution of child nodes in the target constraint text and obtaining the child node probability distribution results. Specifically, the probability distribution of child nodes is calculated using a temperature coefficient and Top-k sampling, as shown in formula (7).

[0039] (7)

[0040] in, Keywords Adjusted target probability, To use an exponential function with the natural constant as its base, This represents the original probability of the keyword. For the candidate set The original probability of each keyword, Temperature coefficient;

[0041] Step E2 involves setting a confidence threshold based on the probability distribution of child nodes and performing node pruning and hierarchical filtering on the target constrained text to obtain the refined target text.

[0042] The aforementioned method for generating audit literature mind maps based on feature refinement and a large language model, in step F, involves performing a hybrid search in the audit literature knowledge base based on the refined target text to obtain the logical framework and relational structure of related audit domains and standard entities. The audit literature knowledge base integrates multi-source heterogeneous data, employs natural language processing technology to parse, classify, and associate texts, forming a knowledge network containing multi-level audit topics, logical relationships, and format templates. The multi-source heterogeneous data includes audit journals, audit literature, audit standards, regulatory documents, industry reports, and case studies. The specific steps are as follows.

[0043] Step F1: Use a pre-trained text embedding model to vectorize the target refined text and obtain the refined text feature vector;

[0044] Step F2: In the audit document knowledge base, a hybrid retrieval strategy combining semantic retrieval based on dense vector similarity and sparse retrieval based on keyword matching is used to recall the initial candidate document fragments.

[0045] Step F3 involves using a re-ranking model to re-score and re-rank the relevance between the initial candidate document fragments and the target refined text, and then extracting the top-k ranked content from the relevance results as the logical framework and standard entity relationship structure of the relevance audit domain.

[0046] The aforementioned method for generating audit document mind maps based on feature refinement and a large language model involves step G, which uses a large language model to parse the refined target text, the logical framework of the relevant audit domain, and the structure of relationships between relevant standard entities to obtain the mind map content information. The specific steps are as follows:

[0047] Step G1 involves combining the refined target text, the logical framework of the related audit domain, and the content of the related standard entity relationship structure to obtain structured prompts for generating the audit mind map;

[0048] Step G2 involves inputting structured prompts into a pre-trained large language model, applying format constraints, generating and outputting structured markdown content with clear parent-child node relationships, and then using the structured markdown content as the mind map content information.

[0049] The aforementioned method for generating audit document mind maps based on feature refinement and large language models, step H, involves inputting the mind map content information into a mind map building tool and generating the target audit document mind map. The specific steps are as follows.

[0050] Step H1 involves performing lexical analysis and grammatical rule verification on the mind map content information to obtain the verified mind map content information. Then, based on text indentation features and hierarchical markers, the topics of each node and the tree-like topological relationships between nodes in the verified mind map content information are extracted.

[0051] Step H2 transforms the topics of each node and the tree-like topological relationships between nodes into an abstract syntax tree that can be recognized by the mind mapping tool. Then, the abstract syntax tree is input into the markdown engine within the mind mapping tool for graphical component mapping and automatic layout calculation to generate the target audit document mind map.

[0052] A mind map generation system for audit documents based on feature refinement and a large language model includes a text extraction module, a keyword extraction module, a clustering and weighting module, a text constraint module, a text refinement module, a text retrieval module, a content information acquisition module, and a mind map generation module. The text extraction module extracts the subject information, main text information, reference information, and hierarchical information from the target audit document and converts it into structured audit text information. The keyword extraction module uses a hybrid weighting model to calculate the global importance of keywords in the structured audit text information and extract target text keywords. The clustering and weighting module uses hierarchical clustering and dynamic weighting to generate a target text tree structure that conforms to the logic of the target audit document. The text constraint module uses a graph neural network to perform feature aggregation and optimization on the semantic entities and logical relationships in the target text tree structure and obtain... The process involves optimizing the topology structure, applying rules to the optimized topology structure, and obtaining the target constraint text. The text refinement module calculates the probability distribution of child nodes in the target constraint text, obtains the child node probability distribution results, sets a confidence threshold based on the child node probability distribution results, and performs node pruning and hierarchical filtering on the target constraint text to obtain the refined target text. The text retrieval module performs a mixed search in the audit literature knowledge base based on the refined target text and obtains the logical framework and standard entity relationship structure of the related audit domain. The content information acquisition module uses a large language model to parse the refined target text, the logical framework of the related audit domain, and the standard entity relationship structure to obtain mind map content information. The mind map generation module inputs the mind map content information into a mind map building tool and generates a mind map of the target audit literature.

[0053] The beneficial effects of this invention are as follows: This invention effectively realizes that the audit document mind map generation method and system can efficiently generate corresponding audit mind maps for audit documents. Furthermore, by employing graph neural networks to aggregate and optimize the semantic entities and logical relationships in the tree structure of the target text, it enhances the logical coherence of the text object information and ensures the rigorous logical relationships between different nodes. By applying rules to the optimized topology of the target and obtaining the target constrained text, it ensures that the text information conforms to audit domain standards. Simultaneously, by using temperature coefficients and Top-k sampling to calculate the probability distribution of child nodes, it can adjust the detail richness of the mind map. This invention, by employing text feature refinement, effectively overcomes the limitations of traditional keyword extraction methods in the audit professional context. This invention addresses the challenge of insufficient understanding of audit literature. It not only enhances the systematic and logical nature of knowledge organization but also allows for refined control over the generated content. Furthermore, it provides users with precise guidance for reading audit literature and significantly improves the efficiency of understanding and analyzing audit documents. By constructing an audit literature knowledge base that integrates multi-source heterogeneous data such as audit standards, regulations, and typical cases, and employing vector retrieval and prompting engineering for external knowledge injection, the large language model can reference relevant authoritative information during the generation process. This not only improves the credibility of the output content but also reduces the time cost for researchers, audit practitioners, and students in reading and organizing literature. This invention generates structured and accurate audit literature mind maps without manual intervention, demonstrating significant application value and promising prospects. Attached Figure Description

[0054] Figure 1 This is an overall flowchart of a method for generating audit document mind maps based on feature refinement and large language models according to the present invention. Detailed Implementation

[0055] The present invention will now be further described with reference to the accompanying drawings.

[0056] like Figure 1 As shown, the present invention provides a method for generating audit document mind maps based on feature refinement and large language models, comprising the following steps:

[0057] Step A: Extract the subject information, main text information, reference information and hierarchical information from the target audit document and convert them into structured audit text information. The target audit document is stored in PDF format, and the structured audit text information is stored in JSON format.

[0058] Step B involves using a hybrid weighting model to calculate the global importance of keywords in the structured audit text information and extracting target text keywords. The global importance of the keywords is used to measure the semantic relevance between the keywords and the topic of the target audit document. The hybrid weighting model specifically uses TF-IDF weighting and Sentence-BERT similarity for calculation, as shown in formula (1).

[0059] (1)

[0060] in, Candidate keywords Its global importance For weight parameters, Candidate keywords The term frequency-inverse document frequency value, Candidate keywords Document topic vector Semantic similarity between them.

[0061] Step C involves using hierarchical clustering and dynamic weighting to generate a tree-like structure of the target text keywords that conforms to the logic of the target audit document. The specific steps are as follows:

[0062] Step C1 involves performing hierarchical clustering of the target text keywords using Sentence-BERT embedding clustering, as shown in formula (2).

[0063] (2)

[0064] in, node vector and node vectors Similarity between them and All are node vectors corresponding to keywords in the target text;

[0065] Step C2 involves using a dynamic weighting algorithm to dynamically weight and evaluate the keywords in the target text, as shown in formula (3).

[0066] (3)

[0067] in, For the current node The dynamic comprehensive evaluation value, This represents the coverage of nodes corresponding to the generated text keywords. This is a dynamically weighted adjustment term based on semantic distance. This represents the semantic distance between the node corresponding to the current text keyword and the target node.

[0068] Step D involves using a graph neural network to aggregate and optimize the semantic entities and logical relationships in the tree structure of the target text to obtain the optimized topology. Then, rule constraints are applied to the optimized topology to obtain the constrained target text. The specific steps are as follows:

[0069] Step D1 involves using a graph neural network to aggregate and optimize the semantic entities and logical relationships in the tree structure of the target text to obtain the optimized topology of the target, as shown in formula (4).

[0070] (4)

[0071] in, and The first Layer and first The node feature matrix of the layer, It is a non-linear activation function. For degree matrix, It is an adjacency matrix. For the first Layer weight matrix;

[0072] Step D2 involves applying rule constraints to the target optimization topology and obtaining the target constraint text. The specific steps are as follows:

[0073] Step D21, define the rule loss as shown in formula (5).

[0074] (5)

[0075] in, Loss due to rules, For a set of audit rules, For audit rule set Satisfaction level;

[0076] Step D22: Construct the total loss function for rule constraints based on the rule loss, as shown in formula (6).

[0077] (6)

[0078] in, The total loss is constrained by the rules. Loss due to natural language processing tasks For rule constraint weight parameters, Loss due to rules.

[0079] Step E involves calculating the probability distribution of child nodes in the target constraint text and obtaining the results. Then, based on these results, a confidence threshold is set, and node pruning and hierarchical filtering are performed on the target constraint text to obtain the refined target text. The specific steps are as follows:

[0080] Step E1 involves calculating the probability distribution of child nodes in the target constraint text and obtaining the child node probability distribution results. Specifically, the probability distribution of child nodes is calculated using a temperature coefficient and Top-k sampling, as shown in formula (7).

[0081] (7)

[0082] in, Keywords Adjusted target probability, To use an exponential function with the natural constant as its base, This represents the original probability of the keyword. For the candidate set The original probability of each keyword, Temperature coefficient;

[0083] Step E2 involves setting a confidence threshold based on the probability distribution of child nodes and performing node pruning and hierarchical filtering on the target constrained text to obtain the refined target text.

[0084] Step F involves performing a hybrid search in the audit literature knowledge base based on the refined target text to obtain the logical framework and related standard entity relationship structure of the audit domain. The audit literature knowledge base integrates multi-source heterogeneous data, employs natural language processing technology to parse, classify, and associate texts, forming a knowledge network containing multi-level audit topics, logical relationships, and format templates. The multi-source heterogeneous data includes audit journals, audit literature, audit standards, regulatory documents, industry reports, and case studies. The specific steps are as follows.

[0085] Step F1: Use a pre-trained text embedding model to vectorize the target refined text and obtain the refined text feature vector;

[0086] Step F2: In the audit document knowledge base, a hybrid retrieval strategy combining semantic retrieval based on dense vector similarity and sparse retrieval based on keyword matching is used to recall the initial candidate document fragments.

[0087] Step F3 involves using a re-ranking model to re-score and re-rank the relevance between the initial candidate document fragments and the target refined text, and then extracting the top-k ranked content from the relevance results as the logical framework and standard entity relationship structure of the relevance audit domain.

[0088] Step G involves using a large language model to parse the target refined text, the logical framework of the related audit domain, and the structure of related standard entity relationships to obtain mind map content information. The specific steps are as follows:

[0089] Step G1 involves combining the refined target text, the logical framework of the related audit domain, and the content of the related standard entity relationship structure to obtain structured prompts for generating the audit mind map;

[0090] Step G2 involves inputting structured prompts into a pre-trained large language model, applying format constraints, generating and outputting structured markdown content with clear parent-child node relationships, and then using the structured markdown content as the mind map content information.

[0091] Step H involves inputting the mind map content into the mind map building tool and generating a mind map of the target audit document. The specific steps are as follows:

[0092] Step H1 involves performing lexical analysis and grammatical rule verification on the mind map content information to obtain the verified mind map content information. Then, based on text indentation features and hierarchical markers, the topics of each node and the tree-like topological relationships between nodes in the verified mind map content information are extracted.

[0093] Step H2 transforms the topics of each node and the tree-like topological relationships between nodes into an abstract syntax tree that can be recognized by the mind mapping tool. Then, the abstract syntax tree is input into the markdown engine within the mind mapping tool for graphical component mapping and automatic layout calculation to generate the target audit document mind map.

[0094] A mind map generation system for audit documents based on feature refinement and a large language model includes a text extraction module, a keyword extraction module, a clustering and weighting module, a text constraint module, a text refinement module, a text retrieval module, a content information acquisition module, and a mind map generation module. The text extraction module extracts the subject information, main text information, reference information, and hierarchical information from the target audit document and converts it into structured audit text information. The keyword extraction module uses a hybrid weighting model to calculate the global importance of keywords in the structured audit text information and extract target text keywords. The clustering and weighting module uses hierarchical clustering and dynamic weighting to generate a target text tree structure that conforms to the logic of the target audit document. The text constraint module uses a graph neural network to perform feature aggregation and optimization on the semantic entities and logical relationships in the target text tree structure and obtain... The process involves optimizing the topology structure, applying rules to the optimized topology structure, and obtaining the target constraint text. The text refinement module calculates the probability distribution of child nodes in the target constraint text, obtains the child node probability distribution results, sets a confidence threshold based on the child node probability distribution results, and performs node pruning and hierarchical filtering on the target constraint text to obtain the refined target text. The text retrieval module performs a mixed search in the audit literature knowledge base based on the refined target text and obtains the logical framework and standard entity relationship structure of the related audit domain. The content information acquisition module uses a large language model to parse the refined target text, the logical framework of the related audit domain, and the standard entity relationship structure to obtain mind map content information. The mind map generation module inputs the mind map content information into a mind map building tool and generates a mind map of the target audit literature.

[0095] To better illustrate the effects of this invention, a specific embodiment of generating audit document mind maps using the audit document mind map generation method and system of this invention is described below.

[0096] In this embodiment, the large language model selected is the Qwen3-30B-A3B-Instruct-2507 model from the Tongyi Qianwen series. This model possesses powerful language understanding and generation capabilities, exhibiting excellent performance in parameter scale, reasoning ability, and instruction compliance. It can accurately parse complex professional text and generate high-quality structured output. In this step, the text content generated after feature refinement is combined with the corresponding information generated from the audit literature knowledge base to define the core key point set. At the same time, formatting constraints such as the hierarchical structure of the mind map, node naming conventions, and logical relationship expression methods are set. Then, all the above information is integrated into structured prompt words and input into the Qwen3-30B-A3B-Instruct-2507 model to generate the corresponding mind map node content, which is displayed in the form of text containing Markdown syntax.

[0097] This embodiment receives mind map node content generated from a large language model and imports it into a specially designed mind map building tool, markmap-cli. This tool is responsible for converting the text-based node content into a graphical mind map to intuitively and clearly display the information structure and logical relationships.

[0098] In summary, the present invention provides a method and system for generating mind maps of audit documents based on feature refinement and a large language model. First, it extracts the subject information, main text information, reference information, and hierarchical information from the target audit document and converts them into structured audit text information. Then, it uses a hybrid weight model to calculate the global importance of keywords in the structured audit text information and extracts target text keywords. Next, it uses hierarchical clustering and dynamic weighting on the target text keywords to generate a target text tree structure that conforms to the logic of the target audit document. Subsequently, it uses a graph neural network to perform feature aggregation and optimization of the semantic entities and logical relationships in the target text tree structure to obtain an optimized target topology. Finally, it applies rule constraints to the optimized target topology to obtain... The target constraint text is first processed, and then the probability distribution of its child nodes is calculated. Based on this probability distribution, a confidence threshold is set, and node pruning and hierarchical filtering are performed on the target constraint text to obtain a refined target text. Next, a hybrid search is conducted in the audit literature knowledge base based on the refined target text to obtain the logical framework and standard entity relationship structure of the relevant audit domain. Then, based on the refined target text, the logical framework of the relevant audit domain, and the standard entity relationship structure, a large language model is used to parse and obtain mind map content information. Finally, the mind map content information is input into a mind map building tool to generate a mind map of the target audit literature; this effectively achieves the goal of... The method and system for generating audit document mind maps can efficiently generate corresponding audit mind maps for audit documents. By employing graph neural networks to aggregate and optimize the semantic entities and logical relationships in the tree structure of the target text, it enhances the logical coherence of the text information and ensures the rigorous logical relationships between different nodes. Furthermore, by applying rules to the optimized topology of the target and obtaining the target constrained text, it ensures that the text information conforms to audit domain standards. Simultaneously, by using temperature coefficients and Top-k sampling to calculate the probability distribution of child nodes, it can adjust the detail richness of the mind map. This invention, through text feature refinement, effectively overcomes the predicament of insufficient semantic understanding in the audit professional context of traditional keyword extraction methods. This invention not only enhances the systematicness and logic of knowledge organization but also allows for refined control over the generated content. Simultaneously, it provides users with precise guidance for reading audit documents and significantly improves the efficiency of understanding and analyzing audit documents. By constructing an audit document knowledge base that integrates multi-source heterogeneous data such as audit standards, regulatory documents, and typical cases, and employing vector retrieval and prompting engineering for external knowledge injection, the large language model can reference relevant authoritative information during the generation process. This not only improves the credibility of the output content but also reduces the time cost for researchers, audit practitioners, and students in reading documents and organizing knowledge. This invention, which generates structured and accurate audit document mind maps without manual intervention, has significant application value and prospects.

[0099] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of the present invention is defined by the appended claims and their equivalents.

Claims

1. A method for generating mind maps of audit documents based on feature refinement and large language models, characterized in that: Includes the following steps, Step A: Extract the subject information, text information, reference information, and hierarchical information from the target audit document and convert them into structured audit text information; Step B involves using a hybrid weighting model to calculate the global importance of keywords in the structured audit text information and extracting target text keywords. Step C involves using hierarchical clustering and dynamic weighting to generate a tree-like structure of the target text that conforms to the logic of the target audit document; Step D involves using a graph neural network to aggregate and optimize the semantic entities and logical relationships in the tree structure of the target text to obtain the optimized topology. Then, rules are applied to the optimized topology to obtain the constrained target text. The specific steps are as follows: Step D1 involves using a graph neural network to aggregate and optimize the semantic entities and logical relationships in the tree structure of the target text to obtain the optimized topology of the target, as shown in formula (4). (4) in, and The first Layer and first The node feature matrix of the layer It is a non-linear activation function. For degree matrix, It is an adjacency matrix. For the first Layer weight matrix; Step D2 involves applying rule constraints to the target optimization topology and obtaining the target constraint text. The specific steps are as follows: Step D21, define the rule loss as shown in formula (5). (5) in, Loss due to rules, For a set of audit rules, For audit rule set Satisfaction level; Step D22: Construct the total loss function for rule constraints based on the rule loss, as shown in formula (6). (6) in, The total loss is constrained by the rules. Loss due to natural language processing tasks For rule constraint weight parameters, Loss due to rules; Step E: Calculate the probability distribution of child nodes in the target constraint text and obtain the child node probability distribution results. Then, set the confidence threshold based on the child node probability distribution results and perform node pruning and hierarchical filtering on the scale of the target constraint text to obtain the target refined text. Step F involves performing a hybrid search in the audit literature knowledge base based on the target refined text to obtain the logical framework of the relevant audit domain and the structure of the relationship between relevant standard entities; Step G involves using a large language model to parse the target refined text, the logical framework of the related audit domain, and the content of the related standard entity relationship structure to obtain the mind map content information; Step H: Input the mind map content information into the mind map building tool and generate the target audit document mind map.

2. The method for generating audit document mind maps based on feature refinement and large language models according to claim 1, characterized in that: Step A: Extract the subject information, main text information, reference information and hierarchical information from the target audit document and convert them into structured audit text information. The target audit document is stored in PDF format, and the structured audit text information is stored in JSON format.

3. The method for generating audit document mind maps based on feature refinement and large language models according to claim 2, characterized in that: Step B involves using a hybrid weighting model to calculate the global importance of keywords in the structured audit text information and extracting target text keywords. The global importance of the keywords is used to measure the semantic relevance between the keywords and the topic of the target audit document. The hybrid weighting model specifically uses TF-IDF weighting and Sentence-BERT similarity for calculation, as shown in formula (1). (1) in, Candidate keywords Its global importance For weight parameters, Candidate keywords The term frequency-inverse document frequency value, Candidate keywords Document topic vector Semantic similarity between them.

4. The method for generating audit document mind maps based on feature refinement and large language model according to claim 3, characterized in that: Step C involves using hierarchical clustering and dynamic weighting to generate a tree-like structure of the target text keywords that conforms to the logic of the target audit document. The specific steps are as follows: Step C1 involves performing hierarchical clustering of the target text keywords using Sentence-BERT embedding clustering, as shown in formula (2). (2) in, node vector and node vectors Similarity between them and All are node vectors corresponding to keywords in the target text; Step C2 involves using a dynamic weighting algorithm to dynamically weight and evaluate the keywords in the target text, as shown in formula (3). (3) in, For the current node The dynamic comprehensive evaluation value, This represents the coverage of nodes corresponding to the generated text keywords. This is a dynamically weighted adjustment term based on semantic distance. This represents the semantic distance between the node corresponding to the current text keyword and the target node.

5. The method for generating audit document mind maps based on feature refinement and large language models according to claim 4, characterized in that: Step E involves calculating the probability distribution of child nodes in the target constraint text and obtaining the results. Then, based on these results, a confidence threshold is set, and node pruning and hierarchical filtering are performed on the target constraint text to obtain the refined target text. The specific steps are as follows: Step E1 involves calculating the probability distribution of child nodes in the target constraint text and obtaining the child node probability distribution results. Specifically, the probability distribution of child nodes is calculated using a temperature coefficient and Top-k sampling, as shown in formula (7). (7) in, Keywords Adjusted target probability, To use an exponential function with the natural constant as its base, This represents the original probability of the keyword. For the candidate set The original probability of each keyword, Temperature coefficient; Step E2 involves setting a confidence threshold based on the probability distribution of child nodes and performing node pruning and hierarchical filtering on the target constrained text to obtain the refined target text.

6. The method for generating audit document mind maps based on feature refinement and large language models according to claim 5, characterized in that: Step F involves performing a hybrid search in the audit literature knowledge base based on the refined target text to obtain the logical framework and related standard entity relationship structure of the audit domain. The audit literature knowledge base integrates multi-source heterogeneous data, employs natural language processing technology to parse, classify, and associate text, forming a knowledge network containing multi-level audit topics, logical relationships, and format templates. The multi-source heterogeneous data includes audit journals, audit literature, audit standards, regulatory documents, industry reports, and case studies. The specific steps are as follows. Step F1: Use a pre-trained text embedding model to vectorize the target refined text and obtain the refined text feature vector; Step F2: In the audit document knowledge base, a hybrid retrieval strategy combining semantic retrieval based on dense vector similarity and sparse retrieval based on keyword matching is used to recall the initial candidate document fragments. Step F3 involves using a re-ranking model to re-score and re-rank the relevance between the initial candidate document fragments and the target refined text, and then extracting the top-k ranked content from the relevance results as the logical framework and standard entity relationship structure of the relevance audit domain.

7. The method for generating audit document mind maps based on feature refinement and large language models according to claim 6, characterized in that: Step G involves using a large language model to parse the target refined text, the logical framework of the related audit domain, and the structure of related standard entity relationships to obtain mind map content information. The specific steps are as follows: Step G1 involves combining the refined target text, the logical framework of the related audit domain, and the content of the related standard entity relationship structure to obtain structured prompts for generating the audit mind map; Step G2 involves inputting structured prompts into a pre-trained large language model, applying format constraints, generating and outputting structured markdown content with clear parent-child node relationships, and then using the structured markdown content as the mind map content information.

8. The method for generating audit document mind maps based on feature refinement and large language models according to claim 7, characterized in that: Step H involves inputting the mind map content into the mind map building tool and generating a mind map of the target audit document. The specific steps are as follows: Step H1 involves performing lexical analysis and grammatical rule verification on the mind map content information to obtain the verified mind map content information. Then, based on text indentation features and hierarchical markers, the topics of each node and the tree-like topological relationships between nodes in the verified mind map content information are extracted. Step H2 transforms the topics of each node and the tree-like topological relationships between nodes into an abstract syntax tree that can be recognized by the mind mapping tool. Then, the abstract syntax tree is input into the markdown engine within the mind mapping tool for graphical component mapping and automatic layout calculation to generate the target audit document mind map.

9. An audit document mind map generation system based on feature refinement and large language model, wherein the specific generation process of the audit document mind map generation system is based on the audit document mind map generation method according to any one of claims 1-8, characterized in that: It includes a text extraction module, a keyword extraction module, a clustering weighting module, a text constraint module, a text refinement module, a text retrieval module, a content information acquisition module, and a mind map generation module. The text extraction module is used to extract the subject information, main text information, reference information, and hierarchical information from the target audit document and convert them into structured audit text information. The keyword extraction module is used to calculate the global importance of keywords in structured audit text information and extract target text keywords using a hybrid weight model; The clustering weighting module is used to generate a tree structure of target text that conforms to the logic of the target audit document by hierarchical clustering and dynamic weighting of the target text keywords. The text constraint module is used to perform feature aggregation and optimization of semantic entities and logical relationships in the tree structure of the target text using a graph neural network to obtain the target optimized topology structure, and then apply rule constraints to the target optimized topology structure to obtain the target constrained text. The text refinement module is used to calculate the probability distribution of child nodes in the target constraint text and obtain the probability distribution results of child nodes. Then, based on the probability distribution results of child nodes, a confidence threshold is set, and node pruning and hierarchical filtering are performed on the scale of the target constraint text to obtain the target refined text. The text retrieval module is used to perform mixed retrieval in the audit literature knowledge base based on the target refined text and obtain the logical framework of related audit fields and the structure of related standard entity relationships; The content information acquisition module is used to parse and obtain mind map content information based on the target refined text, the logical framework of the related audit domain and the related standard entity relationship structure using a large language model; The mind map generation module is used to input mind map content information into the mind map building tool and generate a mind map of the target audit document.