Enterprise finance and tax health intelligent diagnosis system and method based on knowledge graph
By constructing a corporate financial and tax knowledge graph, multimodal data fusion and graph neural network causal tracing are achieved, solving the problems of data silos and insufficient rectification suggestions in corporate financial and tax risk diagnosis, and providing accurate risk quantification and actionable rectification suggestions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO UNIV OF TECH
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-12
AI Technical Summary
Existing corporate financial and tax risk diagnosis technologies suffer from problems such as data silos and semantic fragmentation, lack of understanding of the differences between tax and accounting, insufficient modeling of risk transmission mechanisms, and lack of explainable rectification suggestions. This makes it difficult for enterprises to accurately quantify financial and tax risks and generate effective rectification suggestions.
We construct a knowledge graph integrating corporate finance and taxation, achieve multimodal data fusion through entity extraction and rule mapping, use graph neural networks for causal tracing, and generate actionable compliance rectification suggestions.
It enables precise quantification and causal tracing of internal and external financial and tax risks for enterprises, generates actionable rectification suggestions, and improves the comprehensiveness and accuracy of risk assessment.
Smart Images

Figure CN122199172A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of financial big data risk control technology, and in particular to an intelligent diagnostic system and method for enterprise financial and tax health based on knowledge graphs. Background Technology
[0002] With the development of the digital economy, corporate financial and tax compliance has become a core factor affecting the survival and development of enterprises. Existing corporate financial and tax risk diagnosis technologies are mainly divided into two categories: one is expert systems based on rule engines, which provide early warnings by setting preset financial indicator thresholds (such as tax burden rate and debt-to-equity ratio); the other is statistical models based on traditional machine learning, which make predictions by classifying structured financial statement data.
[0003] However, existing technologies suffer from the following significant drawbacks in practical applications: First, data silos and semantic fragmentation are severe. Existing systems often struggle to integrate unstructured invoice images, contract texts, and structured financial documents, resulting in physical isolation of "business, finance, and tax" data. This makes it impossible to automatically verify whether business documents (such as R&D expenditures) possess a complete chain of evidence (e.g., missing project initiation documents). Second, there is a lack of logical understanding of accounting-tax differences. Accounting standards and tax regulations have objective differences in recognition and measurement (e.g., the deduction limit for business entertainment expenses). Existing technologies often rely on manual adjustments or simple numerical comparisons, failing to explicitly express this logical conflict through graphical structures and making it difficult to automatically generate tax adjustment suggestions. Third, the risk transmission mechanism modeling is insufficient. A company's financial and tax health is heavily influenced by upstream and downstream supply chains (e.g., upstream fraudulent invoicing). Existing methods struggle to quantify the specific extent to which external related risks permeate the company's internal compliance. Finally, there is a lack of explainable rectification suggestions. Existing risk control systems often stop at issuing alerts, failing to automatically generate specific rectification suggestions based on causal reasoning, leaving companies aware of risks but unsure how to address them.
[0004] Therefore, there is an urgent need for a corporate financial and tax health diagnosis method that can integrate multimodal data, deeply understand the logic of tax and accounting differences, and provide intelligent rectification suggestions. Summary of the Invention
[0005] This invention provides a knowledge graph-based intelligent diagnostic system and method for corporate financial and tax health. By constructing a knowledge graph integrating corporate finance and taxation and a dynamic mapping mechanism for tax and accounting differences, it can accurately quantify and trace the causal origins of internal and external financial and tax risks of enterprises, and automatically generate actionable compliance rectification suggestions based on graph structure reasoning.
[0006] This invention provides a knowledge graph-based intelligent diagnostic method for enterprise financial and tax health, comprising:
[0007] S1. Obtain multimodal financial and tax data of the target enterprise and extract entities and relationships using an entity extraction algorithm to construct an enterprise financial and tax knowledge graph; wherein, the enterprise financial and tax knowledge graph includes target enterprise nodes, business voucher nodes, accounting subject nodes, and related enterprise nodes connected to the target enterprise nodes through transaction relationship edges;
[0008] S2. Construct a first set of rule nodes based on accounting standards and a second set of rule nodes based on tax regulations in the enterprise financial and tax knowledge graph, and configure mapping relationship edges between the first and second rule nodes according to business scenarios; wherein, the first set of rule nodes establishes a settling relationship with the accounting subject nodes, and the second set of rule nodes is used to define tax restriction conditions;
[0009] S3. Identify the first rule node associated with the business voucher node through the accounting subject node, and retrieve the corresponding second rule node based on the mapping relationship. Calculate the attribute difference value between the first rule node and the second rule node, and generate a tax adjustment virtual node when the preset conditions are met. Assign a risk weight to the tax adjustment virtual node based on the attribute difference value.
[0010] S4. Input the enterprise financial and tax knowledge graph processed in S3 into the graph neural network model to aggregate the risk characteristics of the related enterprise nodes to the target enterprise node through the transaction relationship edge, and fuse the risk weight of the tax adjustment virtual node to the target enterprise node to obtain the health feature vector of the target enterprise node;
[0011] S5. When the health feature vector indicates the existence of financial and tax risks, extract the business voucher nodes and their first-order neighbor nodes related to the financial and tax risks, form the local topology structure to be detected, and compare it with the preset compliance subgraph template to map the missing entity node types into rectification suggestion text, and generate a financial and tax health diagnosis report containing the rectification suggestion text.
[0012] Furthermore, S1 specifically includes:
[0013] S101. Access the target enterprise's business system, obtain multimodal financial and tax data, and perform deduplication and noise reduction preprocessing; wherein, the multimodal financial and tax data includes structured data, unstructured image data, and unstructured text data;
[0014] S102. Use a deep learning optical character recognition model to identify key field information in the unstructured image data; use natural language processing technology to parse semantic information in the unstructured text data;
[0015] S103. Instantiate the target enterprise node, business voucher node, accounting subject node and related enterprise node from the processed multimodal financial and tax data using the entity extraction model and rule mapping engine.
[0016] S104. Construct the connections between nodes based on business logic, including the aggregation relationship edges from business voucher nodes to accounting subject nodes, and the transaction relationship edges connecting related enterprise nodes and target enterprise nodes.
[0017] S105. Store the generated nodes and edges in the graph database, and establish a time index for the business voucher nodes to complete the construction of the enterprise financial and tax knowledge graph.
[0018] Furthermore, S2 specifically includes:
[0019] S201. Access the digitalized enterprise accounting standards, enterprise income tax law and its implementing regulations, parse the logical control clauses therein and transform them into structured rule entities; wherein, the structured rule entities include applicable objects, calculation base and deduction ratio attributes;
[0020] S202. Instantiate the first set of rule nodes in the enterprise financial and tax knowledge graph based on accounting standards, and establish a settling relationship edge from the accounting subject node to the corresponding first rule node;
[0021] S203. Instantiate the second set of rule nodes in the enterprise's financial and tax knowledge graph based on tax regulations, and configure calculation attribute parameters for each second rule node; wherein, the calculation attribute parameters include the pre-tax deduction ratio, the pre-tax deduction limit standard and limit ratio, the additional deduction ratio and the applicable tax rate level;
[0022] S204. Based on the preset tax and accounting difference comparison table, identify the logical relationship between the first rule node and the second rule node, and establish a directed mapping relationship edge.
[0023] S205. Configure a dynamic value retrieval interface for the second rule node that relies on dynamic data, so as to obtain global data from the enterprise financial and tax knowledge graph at runtime for limit calculation.
[0024] Furthermore, S3 specifically includes:
[0025] S301. Traverse the enterprise financial and tax knowledge graph, filter the business voucher nodes to be detected based on the time index, and locate the accounting subject nodes to which they belong along the aggregation relationship edge.
[0026] S302. Perform a chained indexing operation to identify the first rule node associated with the accounting subject node, and index along the mapping relationship edge to the corresponding second rule node;
[0027] S303. Call the calculation attribute parameters of the second rule node to calculate the tax-permitted amount for the business voucher node, and record the absolute value of the difference between the book amount of the business voucher node and the tax-permitted amount as the attribute difference value.
[0028] S304. Compare the attribute difference value with a preset ignore threshold. When the attribute difference value is greater than the ignore threshold, instantiate a tax adjustment virtual node in the enterprise financial and tax knowledge graph and establish a connection edge from the business voucher node to the tax adjustment virtual node.
[0029] S305. The attribute difference values are normalized to generate risk weights, and the risk weights are assigned to the tax adjustment virtual node; wherein, the risk weights are calculated using the following nonlinear function:
[0030]
[0031] Wherein, W represents the risk weight, and its value ranges from (0,1]. The value represents the attribute difference; e represents the base of the natural logarithm; k represents the adjustment coefficient, used to control the sensitivity of the risk weight to changes in the difference value; This represents the offset coefficient, used to control the center trigger position of the risk weight.
[0032] Furthermore, in S303, calculating the tax-permitted amount for the business voucher node specifically includes:
[0033] Determine the restriction type of the second rule node. If the restriction type is a fixed-rate deduction type, multiply the book amount of the business voucher node by the deduction ratio in the calculation attribute parameter to obtain the tax-permitted amount.
[0034] If the restriction type is a deduction limit type, then the global operating data in the enterprise's financial and tax knowledge graph is aggregated through the dynamic value retrieval interface as the calculation base, and the calculation base is multiplied by the limit ratio in the calculation attribute parameter to obtain the annual deduction limit value, and the annual deduction limit value is used as the amount allowed by the tax law.
[0035] Furthermore, S4 specifically includes:
[0036] S401. Load the pre-trained graph neural network model and initialize the feature embedding of each node in the enterprise financial and tax knowledge graph.
[0037] S402. Define two message passing channels in the graph neural network model: a supply chain risk channel based on transaction relationship edges and an internal compliance channel based on tax adjustment virtual nodes.
[0038] S403. In the supply chain risk channel, the attention coefficient of the related enterprise node to the target enterprise node is calculated using the graph attention mechanism, and the risk characteristics of the related enterprise node are aggregated according to the attention coefficient to obtain the external risk feature vector.
[0039] S404. In the internal compliance channel, the risk weights of all tax adjustment virtual nodes connected to the target enterprise's business voucher chain are pooled to generate an internal risk feature vector.
[0040] S405. The external risk feature vector and the internal risk feature vector are fused, and the dimensions are transformed through a fully connected layer to generate the updated health feature vector of the target enterprise node.
[0041] Furthermore, in step S403, the formula for calculating the external risk feature vector is:
[0042]
[0043] in, This represents the aggregated external risk feature vector; Represents a nonlinear activation function; This represents the set of associated enterprise nodes connected to the target enterprise node; This represents the attention coefficient of the associated enterprise node j towards the target enterprise node i; This represents the learnable weight matrix of the model along the supply chain risk path; This represents the initial feature vector of the associated enterprise node j.
[0044] Furthermore, S5 specifically includes:
[0045] S501. Input the updated health feature vector into the classification model and calculate the risk category probability distribution; when the probability of a specific risk type exceeds the preset safety threshold, determine that there is a financial and tax risk and identify the risk source business voucher node that caused the risk.
[0046] S502. Perform neighborhood sampling with the risk source business voucher node as the center, extract its first-order neighbor nodes and edges, and construct the local topology structure to be detected; wherein, the first-order neighbor nodes are entity nodes that have a direct reference relationship with the risk source business voucher node, including invoice nodes, contract nodes, supporting document nodes, approval personnel nodes and accounting subject nodes.
[0047] S503. According to the specific risk type, call the corresponding compliance subgraph template from the pre-set library, and use a graph matching algorithm to compare the local topology to be detected with the compliance subgraph template; wherein, the compliance subgraph template is constructed based on the financial and tax audit compliance standards, and defines the set of entity node types and their connection topology relationships that are necessary to achieve compliance in a specific business scenario;
[0048] S504. Calculate the difference set of the node set of the compliant subgraph template relative to the local topology structure to be detected, thereby determining the type of missing entity node;
[0049] S505. Query the natural language operation guidance corresponding to the missing entity node type in the knowledge base of the suggestion mapping, combine it with the voucher attribute information of the risk source business voucher node, generate a rectification instruction containing specific business positioning as a rectification suggestion text, and summarize the rectification suggestion text to generate the financial and tax health diagnosis report.
[0050] This invention also provides a knowledge graph-based intelligent diagnostic system for enterprise financial and tax health. Based on the knowledge graph-based intelligent diagnostic method for enterprise financial and tax health described above, the system includes:
[0051] The acquisition module is used to acquire multimodal financial and tax data of the target enterprise and extract entities and relationships using an entity extraction algorithm to construct an enterprise financial and tax knowledge graph; wherein, the enterprise financial and tax knowledge graph includes target enterprise nodes, business voucher nodes, accounting subject nodes, and related enterprise nodes connected to the target enterprise nodes through transaction relationship edges;
[0052] The construction module is used to construct a first set of rule nodes based on accounting standards and a second set of rule nodes based on tax regulations in the enterprise financial and tax knowledge graph, and to configure mapping relationship edges between the first and second rule nodes according to business scenarios; wherein, the first set of rule nodes establishes a settling relationship with the accounting subject nodes, and the second set of rule nodes is used to define tax restriction conditions;
[0053] The identification module is used to identify the first rule node associated with the business voucher node through the accounting subject node, and retrieve the corresponding second rule node based on the mapping relationship, calculate the attribute difference value between the first rule node and the second rule node, generate a tax adjustment virtual node when the preset conditions are met, and assign a risk weight to the tax adjustment virtual node based on the attribute difference value.
[0054] The aggregation module is used to input the processed enterprise financial and tax knowledge graph into the graph neural network model, so as to aggregate the risk characteristics of the related enterprise nodes to the target enterprise node through the transaction relationship edge, and fuse the risk weight of the tax adjustment virtual node to the target enterprise node to obtain the health feature vector of the target enterprise node;
[0055] The generation module is used to extract business voucher nodes and their first-order neighbor nodes related to the financial and tax risks when the health feature vector indicates the existence of financial and tax risks, to form a local topology structure to be detected and compare it with a preset compliance subgraph template, so as to map the missing entity node types into rectification suggestion text and generate a financial and tax health diagnosis report containing the rectification suggestion text.
[0056] The present invention also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the above-described method.
[0057] The present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the above-described method.
[0058] The beneficial effects of this invention are as follows:
[0059] This invention constructs a multimodal enterprise financial and tax knowledge graph encompassing target enterprises, business vouchers, accounting subjects, and related enterprises, breaking down data silos between business, finance, tax, and legal systems. It utilizes entity extraction algorithms to achieve semantic alignment between unstructured vouchers and structured accounts. A two-layer mapping structure for tax and accounting rules is constructed. By establishing mapping edges between accounting standard nodes and tax law rule nodes, it enables automated calculation of tax and accounting differences (such as permanent and temporal differences) and instantiation of virtual nodes for tax adjustments, accurately quantifying potential tax compliance risks. Furthermore, this invention introduces graph neural networks (GNNs) and a dual-channel message passing mechanism, which not only integrates internal enterprise compliance risks but also effectively aggregates external contagion risks from upstream and downstream of the supply chain through transaction relationship edges, significantly improving the comprehensiveness and accuracy of risk assessment. Finally, based on subgraph matching technology, abstract risk characteristics are reverse-mapped into specific missing entities, providing enterprises with highly actionable financial and tax health rectification suggestions. Attached Figure Description
[0060] Figure 1 This is a schematic diagram of a method flow according to an embodiment of the present invention.
[0061] Figure 2 This is a schematic diagram of the device structure according to an embodiment of the present invention.
[0062] Figure 3 This is a schematic diagram of the internal structure of a computer device according to an embodiment of the present invention.
[0063] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0064] It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
[0065] like Figure 1 As shown, this invention provides a knowledge graph-based intelligent diagnostic method for enterprise financial and tax health, including:
[0066] S1. Obtain multimodal financial and tax data of the target enterprise and extract entities and relationships using an entity extraction algorithm to construct an enterprise financial and tax knowledge graph; wherein, the enterprise financial and tax knowledge graph includes target enterprise nodes, business voucher nodes, accounting subject nodes, and related enterprise nodes connected to the target enterprise nodes through transaction relationship edges.
[0067] In one embodiment, step S1 specifically includes the following sub-steps S101 to S105:
[0068] S101. Access the target company's ERP system, financial accounting system, tax filing system, and supply chain management system via API interfaces or database connectors to obtain multimodal financial and tax data. The multimodal financial and tax data specifically includes:
[0069] Structured data mainly includes chronological ledgers, account balance sheets, supplier / customer master data, and bank transaction records;
[0070] Unstructured image data: mainly includes scanned copies or photos (formats such as JPG, PDF) of VAT special invoices, general invoices, and bank receipts.
[0071] Unstructured text data: mainly includes electronic texts of purchase contracts, sales contracts, and logistics documents.
[0072] The above data is preprocessed to remove duplicate data and data with incorrect formatting, and the image data is converted to grayscale and denoised.
[0073] S102. For the unstructured data obtained in S101, perform the following processing:
[0074] For invoice image data: use a deep learning OCR model (such as PP-OCR) to recognize key fields such as invoice code, invoice number, invoice date, buyer's name, seller's name, amount and tax amount;
[0075] For contract text data: Natural language processing techniques are used for word segmentation and stop word removal to prepare for subsequent entity extraction. The focus is on analyzing key information in the contract, such as the subject matter, transaction amount, and payment terms.
[0076] S103. Using a pre-trained entity extraction model (such as the BERT-BiLSTM-CRF model) and a rule mapping engine, extract and instantiate the following four types of core nodes from the data processed in S101 and S102:
[0077] ① Target Enterprise Node: Instantiated based on the business registration information of the target enterprise, serving as the central node of the graph, and initialized with a health feature vector (dimension such as 1×N, where N is the number of features) with all zeros or a preset baseline value.
[0078] ② Business voucher node: Each financial voucher is instantiated as a node. The node attributes include voucher number, posting date, summary, and debit / credit amount;
[0079] ③ Accounting Subject Nodes: Based on the accounting standards used by the enterprise (such as the "Enterprise Accounting Standards"), instantiate the lowest-level accounting subjects (such as management expenses - business entertainment expenses) as nodes;
[0080] ④ Associated Enterprise Nodes: Instantiate upstream and downstream enterprise nodes (such as suppliers and customers) based on the seller / buyer name in the invoice or the business entity name in the document summary.
[0081] S104. Based on business logic and data reference relationships, construct the edges between nodes:
[0082] Construct aggregation relationship edges: Based on the journal entry information of the voucher, establish directed edges from the business voucher node to the accounting subject node, and record the debit / credit direction and amount of the journal entry in the edge attributes;
[0083] Constructing transaction relationship edges: Based on invoice data or voucher transaction auxiliary accounting information, establish transaction relationship edges from related enterprise nodes to target enterprise nodes or vice versa. These transaction relationship edges not only represent fund transfers but also serve as a channel for risk propagation in subsequent step S4.
[0084] Constructing corroborating relationship edges: Establishing edges from business voucher nodes to the corresponding invoice entities or contract entities to form a business evidence chain.
[0085] S105. Store the entity nodes generated in S103 and the relation edges generated in S104 into a graph database (such as Neo4j) to complete the initial construction of the enterprise financial and tax knowledge graph. During the storage process, a time index is created for each business voucher node, and a hierarchical index is created for the accounting subject node to facilitate the rapid matching and retrieval of rules in subsequent steps S2 and S3.
[0086] S2. Construct a first set of rule nodes based on accounting standards and a second set of rule nodes based on tax regulations in the enterprise financial and tax knowledge graph, and configure mapping relationship edges between the first and second rule nodes according to business scenarios; wherein, the first set of rule nodes establishes a settling relationship with the accounting subject nodes, and the second set of rule nodes is used to define tax restriction conditions.
[0087] In one embodiment, step S2 specifically includes the following sub-steps S201 to S205:
[0088] S201. Pre-access or import a digitized database of laws and regulations, including the *Accounting Standards for Business Enterprises*, the *Enterprise Income Tax Law of the People's Republic of China* and its implementing regulations, and relevant provisional regulations on value-added tax. Perform structured parsing on the aforementioned textual regulations, extracting the logical control clauses and converting them into computer-readable rule entities. For example, the clause "Business entertainment expenses incurred by an enterprise related to its production and operation activities shall be deducted at 60% of the incurred amount" is parsed into structured data containing parameters such as applicable objects, calculation base, and deduction ratio.
[0089] S202. Instantiate the first rule node set (accounting rule layer). Based on the "Enterprise Accounting Standards", the first rule node set is instantiated in batches in the enterprise financial and tax knowledge graph, specifically as follows:
[0090] The first rule node is defined as representing the recognition and measurement rules at the accounting level. Each accounting subject node generated in S1 is traversed, and based on the affiliation of the subject code, a settling relationship edge is established from the accounting subject node to the corresponding first rule node. For example, the "Administrative Expenses - Business Entertainment Expenses" subject node is connected to the "Accounting Standard No. 6602 - Business Entertainment Expenses Accounting Rules" node.
[0091] S203. Instantiate the set of second rule nodes (tax rule layer). Based on tax regulations, instantiate the set of second rule nodes in batches in the enterprise financial and tax knowledge graph.
[0092] Define the second rule node to represent the restrictions at the tax declaration level, and configure calculation attribute parameters for each second rule node. The parameters include, but are not limited to: deduction ratio (e.g., 60%), deduction limit (e.g., 0.5% of the annual sales revenue), additional deduction ratio (e.g., 100% or 75%), tax rate bracket, etc. These parameters will serve as the benchmark data for calculating the difference value in the subsequent S3.
[0093] S204. Based on the preset tax and accounting difference comparison table or expert knowledge base, establish a directed mapping relationship edge between the first rule node and the second rule node, specifically as follows:
[0094] Static mapping: For rules with a clear correspondence, a one-to-one mapping is directly established. For example, the "Accounting - Salary and Wage Calculation Rules" can be mapped to the "Tax Law - Reasonable Salary and Wage Deduction Rules".
[0095] Logical mapping: For rules with logical conflicts, establish mapping edges and mark the conflict type (such as permanent difference or temporal difference).
[0096] The final map logically connects the retrieval path from accounting subjects to tax law restrictions, solving the technical problem of the inability to automatically associate across domains in traditional methods.
[0097] S205. For second rule nodes that partially rely on dynamic data (e.g., the limit usually depends on the total revenue of the year), after establishing the mapping, configure a dynamic value retrieval interface. This interface allows the second rule node to aggregate global data such as main business revenue from the graph constructed in S1 at runtime to calculate the current tax deduction limit standard in real time. That is, when step S3 is executed, traverse the business voucher nodes in the enterprise financial and tax knowledge graph, summarize the transaction amount of specific accounting subject nodes as the calculation base, and calculate the current tax deduction limit in real time in combination with the calculation attribute parameters.
[0098] S3. Identify the first rule node associated with the business voucher node through the accounting subject node, and retrieve the corresponding second rule node based on the mapping relationship. Calculate the attribute difference value between the first rule node and the second rule node, and generate a tax adjustment virtual node when the preset conditions are met. Assign a risk weight to the tax adjustment virtual node based on the attribute difference value.
[0099] In one embodiment, step S3 specifically includes the following sub-steps S301 to S305:
[0100] S301. Activate the differential diagnostic engine to traverse the enterprise financial and tax knowledge graph constructed in S1. To improve computational efficiency, the set of business voucher nodes to be tested can be filtered based on a time index (such as the current accounting period). For each business voucher node in the set... Extract its voucher amount attribute And trace upwards along the aggregation relationship edge to locate the accounting subject node to which it belongs. .
[0101] S302. For each business credential node to be detected Perform chained index operations, specifically:
[0102] Check accounting subject nodes Does it have a first rule node? (Accounting Standards Node). If not found, skip the voucher; if it exists... Further investigation is needed to determine if there is any source from [the source]. The issued mapping edges. If a mapping edge exists, follow the edge to the corresponding second rule node. (Tax law rule node). Finally, a complete logical path was identified in the graph: voucher node → account node → accounting rule → [mapping edge] → tax law rule.
[0103] S303, Determine the second rule node Then, the restriction type attribute of the node is read, and differentiated calculation logic is executed according to different types, specifically including the following two processing modes:
[0104] Mode 1: Fixed-rate deduction calculation mode. When the second rule node is identified as a fixed-rate deduction type (e.g., the provision in the Corporate Income Tax Law that only 60% of the incurred business entertainment expenses are deductible), the deduction is made. The deduction ratio parameter in (e.g., 0.6), read the current business credential node. Book amount Directly calculate the amount allowed by tax law. Calculate attribute difference values ,at this time This refers to the amount of the permanent difference arising from the voucher itself.
[0105] Mode 2: Limit Deduction Calculation Mode. When the second rule node is identified as a limit deduction type (e.g., the provisions of the Enterprise Income Tax Law regarding advertising expenses not exceeding 15% of the annual sales revenue, or employee education expenses not exceeding 8% of the total wages and salaries), the following cascading calculation is executed:
[0106] The dynamic value retrieval interface configured in S205 is invoked to perform an aggregation query in the graph. For example, all nodes in the graph belonging to the main business revenue and other business revenue categories are retrieved, and their amounts are summed to obtain the calculation base. (e.g., total revenue for the year); Read Quota ratio parameters (e.g., 0.15), calculate the current statutory annual deduction limit. The calculated result will be obtained later. The value is defined as the amount allowed by tax law; the query graph shows the cumulative amount incurred for this accounting item in the current year. ;
[0107] like If so, it is determined that there is no risk of overspending on the current voucher. ;
[0108] like If so, it is determined that there is an overspending, and the current voucher is further calculated. The contribution value to the overspending portion is used as the attribute difference value. .
[0109] S304. Calculate the attribute difference values. Compared with the preset ignore threshold (For example, 100 yuan, used to ignore minor differences) for comparison.
[0110] like If no substantial risk is identified, no action will be taken.
[0111] like If a tax-accounting discrepancy exists, a write operation is performed in the graph, specifically: a new tax adjustment virtual node is instantiated. Set the node label to "Tax_Adjustment". Generate a record from the business voucher node. point to The edges are labeled as `generates_risk`. The difference amount... Write the difference type (e.g., increase or decrease). In the attributes.
[0112] S305. To enable the subsequent GNN model in S4 to handle amounts of different magnitudes, [the following steps are taken]. Normalization is performed to generate risk weights W. Specifically, a non-linear activation function (such as the Sigmoid or Log function) is used to map the amount to the (0,1] interval.
[0113]
[0114] Where e represents the base of the natural logarithm; k represents the adjustment coefficient, used to control the sensitivity of the risk weight to changes in the variance value; This represents the offset coefficient, used to control the central trigger position of the risk weight; the calculated W is assigned to the tax adjustment virtual node. This serves as a quantitative indicator of the degree to which this node affects the health status of the target enterprise.
[0115] S4. Input the enterprise financial and tax knowledge graph processed in S3 into the graph neural network model to aggregate the risk characteristics of the related enterprise nodes to the target enterprise node through the transaction relationship edge, and fuse the risk weight of the tax adjustment virtual node to the target enterprise node to obtain the health feature vector of the target enterprise node.
[0116] In one embodiment, step S4 specifically includes the following sub-steps S401 to S405:
[0117] S401. Load the pre-trained Graph Neural Network (GNN) model, preferably using a Graph Attention Network (GAT) architecture to initialize feature embeddings for each type of node in the graph processed in S3. Specifically:
[0118] The target enterprise node is initialized to This includes the company's basic attributes (such as registered capital and years of establishment); the associated company node is initialized as follows: Its feature vector is constructed based on external multidimensional data, including business status (normal / cancelled), tax credit rating, and whether there is any judicial litigation; the risk weight W calculated in S3 for the tax adjustment virtual node is mapped to the feature scalar of the node.
[0119] S402. Because the graph contains different types of edges, the graph neural network model defines two independent message passing channels, specifically:
[0120] Supply chain risk channel: Based on transaction relationships, used to transmit externally related risks;
[0121] Internal compliance channel: Based on the risk-generating edge and the aggregation relationship edge, it is used to transmit internal tax difference risks.
[0122] S403. In the supply chain risk channel, the model executes a message passing mechanism to calculate the attention coefficient of the related enterprise node j to the target enterprise node i. . The magnitude of the risk factor is directly proportional to the transaction amount and frequency between the two parties. That is, the larger the transaction amount of the supplier, the greater the weight of the risk characteristics of the supplier in influencing the target company, according to the attention coefficient. The external risk feature vector is calculated by weighting and aggregating the features of related enterprise nodes. :
[0123]
[0124] in, For a set of related enterprise nodes, The weight matrix is a learnable matrix. is the activation function. The external risk feature vector characterizes the degree to which upstream and downstream risks in the supply chain penetrate the target enterprise.
[0125] S404. Within the internal compliance channel, aggregate all tax adjustment virtual nodes connected to the target enterprise's business document chain. Perform a weighted summation or pooling operation on the risk weights W of all tax adjustment virtual nodes to generate an internal risk feature vector. This step integrates the discrete and fragmented tax discrepancies identified in S3 (such as multiple overspending on business entertainment expenses and multiple questionable R&D expense claims) into a unified internal risk profile.
[0126] S405, External risk feature vector With internal risk feature vector The data is then concatenated or weighted and fused, and a dimensionality transformation is performed through a fully connected layer to obtain the final health feature vector of the target enterprise node. The calculation formula is:
[0127]
[0128] Should It is a high-dimensional vector whose numerical distribution implies the current financial and tax health of an enterprise. For example, an abnormally high value in several dimensions of the vector indicates a specific type of compound risk, such as supply chain-related tax risk.
[0129] S5. When the health feature vector indicates the existence of financial and tax risks, extract the business voucher nodes and their first-order neighbor nodes related to the financial and tax risks, form the local topology structure to be detected, and compare it with the preset compliance subgraph template to map the missing entity node types into rectification suggestion text, and generate a financial and tax health diagnosis report containing the rectification suggestion text.
[0130] In one embodiment, step S5 specifically includes the following sub-steps S501 to S505:
[0131] S501, Receive the updated health feature vector output in step S4. In order to determine whether there is a risk, Input a pre-trained classifier (such as a Softmax classifier or an XGBoost classifier). This classifier maps high-dimensional vectors to specific risk class probability distributions. Where k represents different risk types (e.g., risk of missing R&D expense evidence, risk of input tax reversal, risk of related-party transaction pricing). The following decision logic is executed:
[0132] If the probability of all categories If all values are below the preset safety threshold, the enterprise is deemed healthy, and the process ends.
[0133] If the probability of a certain category (denoted as Type-K) If the threshold is exceeded, a deep diagnostic process for the Type-K risk is triggered, and the core node in the graph that causes the risk dimension to rise is identified, namely the risk source business certificate node.
[0134] S502. For the locked risk source business voucher node (e.g., a large R&D expenditure voucher), perform neighborhood sampling in the graph to extract the business voucher node itself and all its directly connected first-order neighbor nodes. The first-order neighbor nodes typically include: supporting document nodes (e.g., invoices, contracts, project proposals, acceptance forms); related personnel nodes (e.g., the person submitting the expense report, the approver); and accounting subject nodes.
[0135] The extracted nodes and their connections together constitute the local topology to be detected, reflecting the actual state of the transaction. .
[0136] S503. Based on the risk type (Type-K) identified in S501, retrieve the corresponding standard template from the pre-built compliance sub-diagram template library. For example, if the risk type is R&D expense deduction risk, then an R&D compliance template containing vouchers, project proposals, work hour records, R&D results reports, and their interconnections will be invoked. A graph matching algorithm (such as the Maximum Common Subgraph (MCS) algorithm) will be executed to... and Perform structural comparison to identify the differences between the two.
[0137] S504. Calculate the set difference by comparing with S503. Output The types of entity nodes included. For example, the comparison results show that... The requirements stipulate the existence of "project initiation document" and "R&D personnel work hour records," while enterprises... Only the "R&D Expenditure Voucher" and "Invoice" nodes exist. Therefore, the missing node types are determined to be "Project_Proposal" and "Timesheet".
[0138] S505: Access the suggestion mapping knowledge base and convert the missing node types identified in S504 into rectification suggestion text in natural language form. For example, Label: "Project_Proposal" → Text: "Please supplement the project initiation resolution or project initiation report for this R&D project." Finally, summarize the rectification suggestions for all risk points to generate a structured financial and tax health diagnosis report. This report not only points out the risks (such as the risk of tax adjustment for R&D expenses) but also provides specific action guidelines (such as suggesting supplementing and uploading the project initiation document corresponding to the voucher number).
[0139] like Figure 2 As shown, the present invention also provides a knowledge graph-based intelligent diagnostic system for enterprise financial and tax health. Based on the knowledge graph-based intelligent diagnostic method for enterprise financial and tax health described above, the system includes:
[0140] Acquisition module 1 is used to acquire multimodal financial and tax data of the target enterprise and extract entities and relationships using an entity extraction algorithm to construct an enterprise financial and tax knowledge graph; wherein, the enterprise financial and tax knowledge graph includes target enterprise nodes, business voucher nodes, accounting subject nodes, and associated enterprise nodes connected to the target enterprise nodes through transaction relationship edges;
[0141] Module 2 is used to construct a first set of rule nodes based on accounting standards and a second set of rule nodes based on tax regulations in the enterprise financial and tax knowledge graph, and to configure mapping relationship edges between the first and second rule nodes according to business scenarios; wherein, the first set of rule nodes establishes a settling relationship with the accounting subject nodes, and the second set of rule nodes is used to define tax restriction conditions;
[0142] The identification module 3 is used to identify the first rule node associated with the business voucher node through the accounting subject node, and retrieve the corresponding second rule node based on the mapping relationship, calculate the attribute difference value between the first rule node and the second rule node, generate a tax adjustment virtual node when the preset conditions are met, and assign a risk weight to the tax adjustment virtual node based on the attribute difference value.
[0143] Aggregation module 4 is used to input the processed enterprise financial and tax knowledge graph into the graph neural network model, so as to aggregate the risk characteristics of the related enterprise nodes to the target enterprise node through the transaction relationship edge, and fuse the risk weight of the tax adjustment virtual node to the target enterprise node to obtain the health feature vector of the target enterprise node;
[0144] The generation module 5 is used to extract business voucher nodes and their first-order neighbor nodes related to the financial and tax risks when the health feature vector indicates the existence of financial and tax risks, to form a local topology structure to be detected and compare it with a preset compliance subgraph template, so as to map the missing entity node type to rectification suggestion text and generate a financial and tax health diagnosis report containing the rectification suggestion text.
[0145] Each of the above modules is used to perform the corresponding steps in the knowledge graph-based intelligent diagnosis method for enterprise financial and tax health. The specific implementation methods are as described in the above method embodiments, and will not be repeated here.
[0146] like Figure 3 As shown, the present invention also provides a computer device, which may be a server, and its internal structure may be as follows: Figure 3As shown, the computer device includes a processor, memory, network interface, and database connected via a system bus. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and database. The internal memory provides the environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores all data required for the process of the knowledge graph-based intelligent diagnostic method for enterprise financial and tax health. The network interface is used for communication with external terminals via a network connection. The computer program is executed by the processor to implement the knowledge graph-based intelligent diagnostic method for enterprise financial and tax health.
[0147] Those skilled in the art will understand that Figure 3 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer equipment on which the present application is applied.
[0148] An embodiment of this application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements any of the above-described knowledge graph-based intelligent diagnostic methods for enterprise financial and tax health.
[0149] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by hardware related to computer program instructions. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the above method embodiments. Any references to memory, storage, databases, or other media provided in this application and used in the embodiments can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM), such as dynamic RAM (used as main storage) or static RAM (commonly used as cache memory). By way of illustration and not limitation, RAM has various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), and Rambus DRAM (RDRAM).
[0150] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, apparatus, article, or method. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, apparatus, article, or method that includes that element.
[0151] The above description is merely a preferred embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.
Claims
1. A knowledge graph-based intelligent diagnostic method for enterprise financial and tax health, characterized in that, include: S1. Obtain multimodal financial and tax data of the target enterprise and extract entities and relationships using an entity extraction algorithm to construct an enterprise financial and tax knowledge graph; wherein, the enterprise financial and tax knowledge graph includes target enterprise nodes, business voucher nodes, accounting subject nodes, and related enterprise nodes connected to the target enterprise nodes through transaction relationship edges; S2. Construct a first set of rule nodes based on accounting standards and a second set of rule nodes based on tax regulations in the enterprise financial and tax knowledge graph, and configure mapping relationship edges between the first and second rule nodes according to business scenarios; wherein, the first set of rule nodes establishes a settling relationship with the accounting subject nodes, and the second set of rule nodes is used to define tax restriction conditions; S3. Identify the first rule node associated with the business voucher node through the accounting subject node, and retrieve the corresponding second rule node based on the mapping relationship. Calculate the attribute difference value between the first rule node and the second rule node, and generate a tax adjustment virtual node when the preset conditions are met. Assign a risk weight to the tax adjustment virtual node based on the attribute difference value. S4. Input the enterprise financial and tax knowledge graph processed in S3 into the graph neural network model to aggregate the risk characteristics of the related enterprise nodes to the target enterprise node through the transaction relationship edge, and fuse the risk weight of the tax adjustment virtual node to the target enterprise node to obtain the health feature vector of the target enterprise node; S5. When the health feature vector indicates the existence of financial and tax risks, extract the business voucher nodes and their first-order neighbor nodes related to the financial and tax risks, form the local topology structure to be detected, and compare it with the preset compliance subgraph template to map the missing entity node types into rectification suggestion text, and generate a financial and tax health diagnosis report containing the rectification suggestion text.
2. The enterprise financial and tax health intelligent diagnosis method based on knowledge graph as described in claim 1, characterized in that, S1 specifically includes: S101. Access the target enterprise's business system, obtain multimodal financial and tax data, and perform deduplication and noise reduction preprocessing; wherein, the multimodal financial and tax data includes structured data, unstructured image data, and unstructured text data; S102. Use a deep learning optical character recognition model to identify key field information in the unstructured image data; use natural language processing technology to parse semantic information in the unstructured text data; S103. Instantiate the target enterprise node, business voucher node, accounting subject node and related enterprise node from the processed multimodal financial and tax data using the entity extraction model and rule mapping engine. S104. Construct the connections between nodes based on business logic, including the aggregation relationship edges from business voucher nodes to accounting subject nodes, and the transaction relationship edges connecting related enterprise nodes and target enterprise nodes. S105. Store the generated nodes and edges in the graph database, and establish a time index for the business voucher nodes to complete the construction of the enterprise financial and tax knowledge graph.
3. The enterprise financial and tax health intelligent diagnosis method based on knowledge graph as described in claim 1, characterized in that, S2 specifically includes: S201. Access the digitalized enterprise accounting standards, enterprise income tax law and its implementing regulations, parse the logical control clauses therein and transform them into structured rule entities; wherein, the structured rule entities include applicable objects, calculation base and deduction ratio attributes; S202. Instantiate the first set of rule nodes in the enterprise financial and tax knowledge graph based on accounting standards, and establish a settling relationship edge from the accounting subject node to the corresponding first rule node; S203. Instantiate the second set of rule nodes in the enterprise's financial and tax knowledge graph based on tax regulations, and configure calculation attribute parameters for each second rule node; wherein, the calculation attribute parameters include the pre-tax deduction ratio, the pre-tax deduction limit standard and limit ratio, the additional deduction ratio and the applicable tax rate level; S204. Based on the preset tax and accounting difference comparison table, identify the logical relationship between the first rule node and the second rule node, and establish a directed mapping relationship edge. S205. Configure a dynamic value retrieval interface for the second rule node that relies on dynamic data, so as to obtain global data from the enterprise financial and tax knowledge graph at runtime for limit calculation.
4. The enterprise financial and tax health intelligent diagnosis method based on knowledge graph as described in claim 1, characterized in that, S3 specifically includes: S301. Traverse the enterprise financial and tax knowledge graph, filter the business voucher nodes to be detected based on the time index, and locate the accounting subject nodes to which they belong along the aggregation relationship edge. S302. Perform a chained indexing operation to identify the first rule node associated with the accounting subject node, and index along the mapping relationship edge to the corresponding second rule node; S303. Call the calculation attribute parameters of the second rule node to calculate the tax-permitted amount for the business voucher node, and record the absolute value of the difference between the book amount of the business voucher node and the tax-permitted amount as the attribute difference value. S304. Compare the attribute difference value with a preset ignore threshold. When the attribute difference value is greater than the ignore threshold, instantiate a tax adjustment virtual node in the enterprise financial and tax knowledge graph and establish a connection edge from the business voucher node to the tax adjustment virtual node. S305. The attribute difference values are normalized to generate risk weights, and the risk weights are assigned to the tax adjustment virtual node; wherein, the risk weights are calculated using the following nonlinear function: Wherein, W represents the risk weight, and its value ranges from (0,1]. The value represents the attribute difference; e represents the base of the natural logarithm; k represents the adjustment coefficient, used to control the sensitivity of the risk weight to changes in the difference value; This represents the offset coefficient, used to control the center trigger position of the risk weight.
5. The enterprise financial and tax health intelligent diagnosis method based on knowledge graph as described in claim 4, characterized in that, In step S303, calculating the tax-permitted amount for the business voucher node specifically includes: Determine the restriction type of the second rule node. If the restriction type is a fixed-rate deduction type, multiply the book amount of the business voucher node by the deduction ratio in the calculation attribute parameter to obtain the tax-permitted amount. If the restriction type is a deduction limit type, then the global operating data in the enterprise's financial and tax knowledge graph is aggregated through the dynamic value retrieval interface as the calculation base, and the calculation base is multiplied by the limit ratio in the calculation attribute parameter to obtain the annual deduction limit value, and the annual deduction limit value is used as the amount allowed by the tax law.
6. The enterprise financial and tax health intelligent diagnosis method based on knowledge graph as described in claim 1, characterized in that, S4 specifically includes: S401. Load the pre-trained graph neural network model and initialize the feature embedding of each node in the enterprise financial and tax knowledge graph. S402. Define two message passing channels in the graph neural network model: a supply chain risk channel based on transaction relationship edges and an internal compliance channel based on tax adjustment virtual nodes. S403. In the supply chain risk channel, the attention coefficient of the related enterprise node to the target enterprise node is calculated using the graph attention mechanism, and the risk characteristics of the related enterprise node are aggregated according to the attention coefficient to obtain the external risk feature vector. S404. In the internal compliance channel, the risk weights of all tax adjustment virtual nodes connected to the target enterprise's business voucher chain are pooled to generate an internal risk feature vector. S405. The external risk feature vector and the internal risk feature vector are fused, and the dimensions are transformed through a fully connected layer to generate the updated health feature vector of the target enterprise node.
7. The enterprise financial and tax health intelligent diagnosis method based on knowledge graph as described in claim 6, characterized in that, In step S403, the formula for calculating the external risk feature vector is as follows: in, This represents the aggregated external risk feature vector; Represents a nonlinear activation function; This represents the set of associated enterprise nodes connected to the target enterprise node; This represents the attention coefficient of the associated enterprise node j towards the target enterprise node i; This represents the learnable weight matrix of the model along the supply chain risk path; This represents the initial feature vector of the associated enterprise node j.
8. The enterprise financial and tax health intelligent diagnosis method based on knowledge graph as described in claim 1, characterized in that, S5 specifically includes: S501. Input the updated health feature vector into the classification model and calculate the risk category probability distribution; when the probability of a specific risk type exceeds the preset safety threshold, determine that there is a financial and tax risk and identify the risk source business voucher node that caused the risk. S502. Perform neighborhood sampling with the risk source business voucher node as the center, extract its first-order neighbor nodes and edges, and construct the local topology structure to be detected; wherein, the first-order neighbor nodes are entity nodes that have a direct reference relationship with the risk source business voucher node, including invoice nodes, contract nodes, supporting document nodes, approval personnel nodes and accounting subject nodes. S503. According to the specific risk type, call the corresponding compliance subgraph template from the pre-set library, and use a graph matching algorithm to compare the local topology to be detected with the compliance subgraph template; wherein, the compliance subgraph template is constructed based on the financial and tax audit compliance standards, and defines the set of entity node types and their connection topology relationships that are necessary to achieve compliance in a specific business scenario; S504. Calculate the difference set of the node set of the compliant subgraph template relative to the local topology structure to be detected, thereby determining the type of missing entity node; S505. Query the natural language operation guidance corresponding to the missing entity node type in the knowledge base of the suggestion mapping, combine it with the voucher attribute information of the risk source business voucher node, generate a rectification instruction containing specific business positioning as a rectification suggestion text, and summarize the rectification suggestion text to generate the financial and tax health diagnosis report.
9. A knowledge graph-based intelligent diagnostic system for enterprise financial and tax health, based on any one of claims 1 to 8, characterized in that, The system includes: The acquisition module is used to acquire multimodal financial and tax data of the target enterprise and extract entities and relationships using an entity extraction algorithm to construct an enterprise financial and tax knowledge graph; wherein, the enterprise financial and tax knowledge graph includes target enterprise nodes, business voucher nodes, accounting subject nodes, and related enterprise nodes connected to the target enterprise nodes through transaction relationship edges; The construction module is used to construct a first set of rule nodes based on accounting standards and a second set of rule nodes based on tax regulations in the enterprise financial and tax knowledge graph, and to configure mapping relationship edges between the first and second rule nodes according to business scenarios; wherein, the first set of rule nodes establishes a settling relationship with the accounting subject nodes, and the second set of rule nodes is used to define tax restriction conditions; The identification module is used to identify the first rule node associated with the business voucher node through the accounting subject node, and retrieve the corresponding second rule node based on the mapping relationship, calculate the attribute difference value between the first rule node and the second rule node, generate a tax adjustment virtual node when the preset conditions are met, and assign a risk weight to the tax adjustment virtual node based on the attribute difference value. The aggregation module is used to input the processed enterprise financial and tax knowledge graph into the graph neural network model, so as to aggregate the risk characteristics of the related enterprise nodes to the target enterprise node through the transaction relationship edge, and fuse the risk weight of the tax adjustment virtual node to the target enterprise node to obtain the health feature vector of the target enterprise node; The generation module is used to extract business voucher nodes and their first-order neighbor nodes related to the financial and tax risks when the health feature vector indicates the existence of financial and tax risks, to form a local topology structure to be detected and compare it with a preset compliance subgraph template, so as to map the missing entity node types into rectification suggestion text and generate a financial and tax health diagnosis report containing the rectification suggestion text.
10. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 8.