A financial abnormal transaction tracing method and system based on multi-modal fusion
By using a multimodal fusion method of financial data, a financial spatiotemporal knowledge graph is constructed and a temporal neural network is used to solve the problems of insufficient multimodal data fusion and lack of time dynamic modeling, thereby realizing the automated tracing of abnormal financial transactions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG XINGKONG INTELLIGENT TECHNOLOGY CO LTD
- Filing Date
- 2026-03-31
- Publication Date
- 2026-06-26
AI Technical Summary
Existing financial auditing methods struggle to deeply integrate multimodal financial data and fail to capture the dynamics of transaction timelines, resulting in delayed anomaly detection and difficulty in tracing hidden related-party transactions.
By using a multimodal fusion method, heterogeneous financial data from multiple sources is acquired, and OCR recognition and NLP parsing are performed to construct a financial spatiotemporal knowledge graph. A time-series graph neural network is used for anomaly scoring and to generate propagation paths.
It achieves deep fusion of multimodal data and dynamic time modeling, automatically traces the source and path of anomalies, and shortens audit time.
Smart Images

Figure CN122288902A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of financial data processing technology, and in particular to a method and system for tracing abnormal financial transactions based on multimodal fusion. Background Technology
[0002] As enterprises deepen their digital transformation, financial data exhibits three main characteristics: multimodality (e.g., structured financial ledgers, unstructured invoices, and contract texts), massive volume, and complex interrelationships. Current financial auditing mainly relies on sampling inspections, which has the following shortcomings: First, it is difficult to detect unknown anomaly patterns and has weak ability to correlate multimodal data; second, anomaly detection is lagging, usually occurring after the fact; and third, it is difficult to trace hidden related transactions, such as those that evade detection through multi-layered nesting and time misalignment.
[0003] In existing technologies, machine learning-based anomaly detection methods can identify statistical anomalies, but they struggle to integrate multimodal information and cannot model transaction topology relationships. Knowledge graph-based methods, while capable of modeling transaction relationships, are primarily static graphs, ignoring temporal evolution characteristics and lack sufficient ability to integrate unstructured data. Therefore, there is an urgent need for an intelligent auditing system that can deeply integrate multimodal financial data, capture transaction time dynamics, and achieve automated source tracing and path generation. Summary of the Invention
[0004] In view of the above problems, this disclosure provides a method and system for tracing abnormal financial transactions based on multimodal fusion to overcome or at least partially solve the above problems. The purpose is to solve technical problems such as insufficient depth of multimodal data fusion, lack of time dynamic modeling, and limited tracing capabilities.
[0005] The objective of this invention can be achieved through the following technical solutions: A first aspect of the technical solution of the present invention provides a method for tracing abnormal financial transactions based on multimodal fusion, comprising: Acquire multi-source heterogeneous financial data, including structured financial ledgers, unstructured invoice images, and contract texts; The invoice image is subjected to OCR recognition to extract invoice information, the contract text is subjected to NLP semantic parsing to extract entities and relationships, and the structured financial ledger is formatted. Align information from different modalities describing the same transaction entity and generate a fused feature vector; A financial spatiotemporal knowledge graph is constructed based on aligned multimodal data. The knowledge graph includes entity nodes, relation edges, and time attributes. Anomaly scoring is performed on transaction nodes in the knowledge graph based on a temporal graph neural network. For nodes with abnormal scores exceeding a preset threshold, source location and propagation path generation are performed, and a structured report is generated.
[0006] Furthermore, The contract text is subjected to NLP semantic parsing to extract entities and relations, including: The Financial-BERT model, pre-trained on a corpus in the financial domain, extracts semantic features from the text. The TPLinker model is then used to extract entities and relations simultaneously, outputting entity-relation triples that include the signatory, amount, and term.
[0007] Furthermore, The step of aligning information from different modalities describing the same transaction entity and generating a fused feature vector includes: Explicit key alignment, matching based on transaction ID, voucher number, or invoice number; Fuzzy attribute alignment calculates weighted similarity of amount, date, and transacting party name when explicit keys are missing or inconsistent; Semantic vector alignment: For entities that cannot be aligned using the above methods, Financial-BERT is used to generate text semantic vectors, which are then matched with the nearest neighbor of the structured feature vectors. Finally, an attention mechanism is used to fuse the features of each modality to obtain a unified feature vector.
[0008] Furthermore, the financial spatiotemporal knowledge graph is stored in the form of quadruples, represented as follows: Where s and o are the head and tail entity nodes, r is the relation edge, and t is the timestamp; entity node types include subject node, transaction node, voucher node, and account node.
[0009] Furthermore, The system employs explicit key alignment, fuzzy attribute alignment, and semantic vector alignment, and integrates features from various modalities through an attention mechanism.
[0010] Furthermore, The temporal graph neural network is TGAT. The node feature aggregation function is obtained through the temporal graph neural network. Based on the node feature aggregation function, the feature representation vector of each transaction node is obtained. The feature is then input into the anomaly scoring network to obtain the anomaly probability.
[0011] Furthermore, The node feature aggregation function is expressed as follows: ;in, This represents the feature representation of node v at layer l. Represents the set of neighbors of node v; Indicates the attention coefficient; represents the temporal embedding of the transaction edge; W represents the learnable weight matrix.
[0012] The anomaly scoring network is represented as follows: , MLP stands for Multilayer Perceptron, which is used to transform the original representation through multiple layers of nonlinear mapping. This transforms the feature space into a latent space more relevant to anomaly detection; and dimensionality compression is performed: the final output layer of the MLP has only one neuron (scalar), compressing the high-dimensional features into a single real number, which is then used for subsequent processing. Provide input.
[0013] A second aspect of the technical solution of the present invention provides a financial abnormal transaction tracing system based on multimodal fusion, comprising: The data acquisition module is used to collect multi-source heterogeneous financial data, including structured financial ledgers, unstructured invoice images, and contract texts. The multimodal processing module is used to perform OCR recognition on the invoice image to extract invoice information, perform NLP semantic parsing on the contract text to extract entities and relationships, and perform formatting processing on the structured financial ledger. The cross-modal alignment and fusion module is used to align information describing the same transaction entity from different modalities and generate a fused feature vector. The knowledge graph construction module is used to construct a financial spatiotemporal knowledge graph based on aligned multimodal data. The knowledge graph includes entity nodes, relation edges, and time attributes. An anomaly detection module is used to score transaction nodes in the knowledge graph for anomalies based on a time-series graph neural network. The anomaly tracing module is used to locate the source and generate the propagation path for nodes whose anomaly scores exceed a preset threshold.
[0014] The anomaly tracing module includes: The source identification unit uses backpropagation attribution or PageRank algorithm with time decay to calculate the contribution of each historical node to the current abnormal node. Nodes with high contribution are identified as abnormal sources. The path search unit uses a time-constrained depth-first search algorithm to search for all paths from the source to the anomalous node in the reverse time-series subgraph and calculates a score for each path. The path pruning and sorting unit retains the K highest-scoring paths.
[0015] A third aspect of the present invention provides a computer-readable storage medium having instructions stored thereon, which, when executed by one or more processors, cause the processors to perform the financial anomaly transaction tracing method based on multimodal fusion as described in the first aspect.
[0016] The technical solution proposed in this application can bring the following beneficial effects: 1. This invention can deeply integrate multimodal financial data, and through cross-modal alignment and fusion, break down the barriers between structured and unstructured data to achieve unified representation and joint reasoning.
[0017] 2. This invention can capture the dynamic evolution of transaction time. By constructing a financial spatiotemporal knowledge graph containing time attributes, and using a time-series graph neural network to model the time-series characteristics of fund flows, it can effectively identify time-sensitive anomaly patterns.
[0018] 3. This invention can achieve source location and path generation. It adopts backpropagation attribution and time-series path search algorithms to automatically identify the source of the anomaly and generate the propagation path, which greatly shortens the tracing time.
[0019] The above description is merely an overview of the technical solution disclosed herein. In order to better understand the technical means of this disclosure and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this disclosure more apparent and understandable, specific embodiments of this disclosure are described below. Attached Figure Description
[0020] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the scope of this disclosure. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings: Figure 1 This is a flowchart illustrating the steps of a method for tracing abnormal financial transactions based on multimodal fusion, as provided in the embodiments of this specification. Figure 2 This is a schematic diagram of a structural damage recognition system based on multi-scale feature fusion provided in the embodiments of this specification. Detailed Implementation
[0021] Exemplary embodiments of the present disclosure will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. The technical solutions provided by various embodiments of this application will be described in detail below with reference to the accompanying drawings.
[0022] This application provides a method and system for tracing abnormal financial transactions based on multimodal fusion.
[0023] The first aspect of the technical solution of the present invention is as follows: Figure 1 The diagram shown is a flowchart of the steps of a financial abnormal transaction tracing method based on multimodal fusion provided in the embodiments of this specification. A method for tracing abnormal financial transactions based on multimodal fusion is provided, including: S101. Obtain multi-source heterogeneous financial data, including structured financial ledgers, unstructured invoice images, and contract texts. The data acquisition module connects to the enterprise ERP system via API interface to synchronize structured financial ledgers in real time, receives invoice images via high-speed document scanner or scanner, and receives contract text (PDF / Word) via file upload interface.
[0024] S102. Perform OCR recognition on the ticket image to extract ticket information; First, text detection is performed: DBNet (Differentiable Binarization Network) is used to detect text regions in the ticket image and output the coordinates of the text boxes. This algorithm is based on differentiable binarization and performs well in detecting curved and slanted text.
[0025] Then, text recognition is performed: a lightweight recognition network, SVTR (Scene Text Recognition with a Single Visual Model), is used in conjunction with an attention mechanism to recognize character sequences within text boxes. Specifically, the recognition accuracy for numbers, amounts, and dates is optimized to suit the characteristics of financial documents.
[0026] Finally, key information is structured: using the BiLSTM-CRF model, key fields are extracted from the identified text sequence, such as invoice code or number, invoice date, buyer / seller information, amount (including / excluding tax), tax amount, and product / service name.
[0027] The combination of DBNet and SVTR is highly adaptable to invoices of different formats and qualities, avoiding the need to design separate templates for each type of invoice. By using the BiLSTM-CRF model combined with contextual semantics, it can accurately extract structured information (such as distinguishing between "amount" and "tax") from messy text. The extracted structured information provides an accurate basis for subsequent comparison with financial ledgers.
[0028] The contract text is subjected to NLP semantic parsing to extract entities and relationships, and the structured financial ledger is formatted. The contract text is subjected to NLP semantic parsing to extract entities and relations, including: The Financial-BERT model, pre-trained on a corpus in the financial domain, extracts semantic features from the text. The TPLinker model is then used to extract entities and relations simultaneously, outputting entity-relation triples that include the signatory, amount, and term.
[0029] Specifically, it includes: The BERT model is further pre-trained based on financial contract texts to obtain Financial-BERT, which adapts the model to the professional terminology and expressions in the financial field. Furthermore, three parallel decoders are designed based on the pre-trained model. Entity extraction task: Identify key entities in the contract, including contracting parties, amount, term, payment terms, liability for breach of contract, etc. Relationship extraction task: Determine the semantic relationships between entities, such as "Party A - Payment - Amount" or "Contract - Association - Transaction"; Clause classification task: Classify contract paragraphs into categories such as "Payment Clauses", "Delivery Clauses", and "Confidentiality Clauses"; The identified content is output in a structured manner, and the parsing results are organized into JSON format, including entity lists, relation triples, and clause classification results.
[0030] The relation extraction uses the TPLinker model, which can handle both entity overlap and relation overlap issues simultaneously, making it suitable for complex clause nesting scenarios in contracts.
[0031] S103. Align information from different modalities describing the same transaction entity and generate a fused feature vector; To address the alignment issue when three modalities describe the same transaction entity, a multi-stage alignment strategy is adopted, including: Explicit key alignment is used to match based on transaction ID, voucher number, or invoice number; for example, "voucher number F2024001" in the ledger is associated with "invoice number F2024001" extracted by OCR.
[0032] For fuzzy attribute alignment, when explicit keys are missing or inconsistent, a weighted similarity is calculated for amount, date, and transacting party name. The calculation formula is as follows: ; in, These are ledger records and invoice information, respectively. These are the monetary values; These are date values; These are the names of the parties involved in the transaction; , , These are the weight coefficients, learned from the training data, and are the default values. , , ; Indicates the similarity of amounts, based on the percentage of absolute error. ; Representing date similarity, based on exponential decay over time. This indicates the similarity over 7 days; The similarity of the transaction parties' names is expressed using edit distance; Edit distance is a classic method for measuring the difference between two strings. It is defined as the minimum number of single-character edit operations required to transform one string into another. Edit distance is a non-negative integer; a larger distance indicates a greater difference. To normalize it to the [0,1] interval, the following formula is typically used: ;in, To edit distance; Given the maximum length of the two strings, this formula normalizes the edit distance as an approximation of the maximum possible edit distance, yielding a similarity based on character editing operations.
[0033] Semantic vector alignment: For entities that cannot be aligned using the methods described above, Financial-BERT is used to generate textual semantic vectors, which are then matched with nearest neighbor vectors. This includes: ; represents the semantic vector of text modal data, obtained by encoding unstructured text (such as contract terms, invoice summaries, transaction notes, etc.) by the Financial-BERT model; for example, a contract text fragment: "The buyer shall pay the full amount within 30 days after the goods arrive, and a penalty of 0.05% per day shall be paid for any overdue payment." After the forward propagation of this text by the Financial-BERT model, the output vector of the last layer [CLS] is taken to obtain a fixed-length vector; ; represents the semantic vector of structured transaction data, which is obtained by encoding the structured attributes of the transaction (amount, time, account, etc.) by the MLP; in The preset similarity threshold is a real number that ranges from 0 to 1. In this embodiment, it is set to 0.7 based on experience. when When two vectors are considered sufficiently similar, a match is deemed successful, meaning that the text and the transaction describe the same entity. when If the similarity is insufficient, it is judged as a mismatch. The explicit key alignment, fuzzy attribute alignment, and semantic vector alignment are performed, and the features of each modality are fused through an attention mechanism to obtain a unified feature vector. , represented as: ;in, For the aligned explicit key mode information; For aligned fuzzy attribute modal information; The aligned semantic vector modal information; It is a learnable attention weight matrix that automatically adjusts the importance of each modality.
[0034] S104. Construct a financial spatiotemporal knowledge graph based on the aligned multimodal data, wherein the knowledge graph includes entity nodes, relation edges, and time attributes; The financial spatiotemporal knowledge graph is stored in the form of quadruples, represented as follows: Where s and o are the head and tail entity nodes, r is the relation edge, and t is the timestamp. For a set of entity nodes, Let be the set of relation edges. It is a set of timestamps; entity node types include subject nodes, transaction nodes, voucher nodes, and account nodes.
[0035] Based on aligned multimodal data, a time-dimensional financial knowledge graph is constructed, containing three core elements: Node definitions, for example, include the following nodes: Key nodes: Enterprise, Individual, Bank Account, Department (Attributes: Name, Type, Credit Rating, etc.); Transaction node: a single transaction or contract (attributes: amount, time, summary, multimodal feature vector); Voucher nodes: Invoices, receipts, contract documents (attributes: file path, OCR result, NLP result); Account Node: Account (Attributes: Account Code, Category, Balance Direction); Edge definitions, for example, include the following edges: Transaction edge: Subject → Transaction → Subject, indicating the flow of funds (attributes: amount, timestamp); Attribution edge: Transaction → Account, indicating the accounting account to which the transaction is recorded; Voucher edge: Transaction → Voucher, representing the original voucher of the transaction; Related edges: Relationships between entities (such as holding, guarantee, related parties); S105. Anomaly scoring of transaction nodes in the knowledge graph is performed based on a temporal graph neural network. The temporal graph neural network is TGAT (Temporal Graph Attention Network). The node feature aggregation function is obtained through the temporal graph neural network. Based on the node feature aggregation function, the feature representation vector of each transaction node is obtained. The feature is then input into the anomaly scoring network to obtain the anomaly probability.
[0036] The node feature aggregation function is expressed as follows: ;in, This represents the feature representation of node v at layer l. Represents the set of neighbors of node v; Indicates the attention coefficient; represents the temporal embedding of the transaction edge; W represents the learnable weight matrix.
[0037] The anomaly scoring network is represented as follows: , MLP stands for Multilayer Perceptron, which is used to transform the original representation through multiple layers of nonlinear mapping. This transforms the feature into a latent space more relevant to anomaly detection; and dimensionality compression is performed: typically, the final output layer of an MLP has only one neuron (scalar), compressing the high-dimensional features into a single real number for subsequent processing. Provide input.
[0038] TGAT considers both graph topology and temporal order, enabling it to capture time-sensitive anomaly patterns such as "multiple scattered transitions within a short period." Compared to traditional GNNs, TGAT is more sensitive to temporal perturbations and can detect anomalies hidden by temporal misalignment.
[0039] S106. Locate the source and generate the propagation path for nodes with abnormal scores exceeding the preset threshold, and generate a structured report.
[0040] For example, when the anomaly score exceeds the threshold, the tracing module is triggered to automatically locate the source of the anomaly and generate a propagation path; for example, in this embodiment, the preset threshold is 0.8.
[0041] First, the source of the anomaly is identified by using the backpropagation attribution algorithm. Starting from the anomaly node, the algorithm traces back along the reverse direction of the transaction: gradient propagation is used to calculate the gradient of the anomaly score for each historical node. Nodes with larger gradients indicate that they have made a greater contribution to the current anomaly. Combined with a variant of PageRank, the propagation path with similar time is selected first. The PageRank variant runs the PageRank algorithm with time decay on the inverted subgraph, and the calculation formula is as follows: ; Where PR(u) represents the PageRank score of node u, that is, the probability that node u is the source of an anomaly; PageRank score of node v; The damping factor has a value range of (0,1), and in this embodiment, it is set to 0.85. It represents that the random walker continues to move along the edge of the graph with probability d and randomly jumps to any node with probability 1d. In this embodiment, d controls the continuity of anomaly propagation: the larger d is, the more likely the anomaly propagation will be transmitted along the actual transaction path; the smaller d is, the easier it is for the anomaly propagation to be interrupted by randomness, which helps to prevent excessively long false paths. Let be the set of outgoing neighbors of node u. In the original graph, outgoing edges represent transactions flowing from u to other nodes. However, in the reverse tracing scenario, we construct a reverse graph (edges are reversed), so the "outgoing neighbors" here are actually the source nodes of the funds flowing into u. That is, if there is a transaction v→u, then there is an edge u→v in the reverse graph. ; Let v represent the set of incoming neighbors of node v; in the reverse graph, incoming edges represent transactions flowing from v to other nodes (i.e., outgoing edges in the original graph). It is the out-degree of v (in the original graph), used to normalize the score of v and avoid the score being monopolized by a few height nodes; It is the time decay factor. It is the decay coefficient (a real number greater than zero), which controls the rate of time decay. The larger the value, the more severe the penalty for the time difference; It is the transaction time difference from node v to node u; the unit can be days, hours, etc.
[0042] Then, a propagation path search is performed: using a shortest path algorithm with temporal constraints, all possible paths from the source to the anomalous node are searched in the reverse temporal subgraph, including: Temporal DFS is employed: This depth-first search satisfies the time-incrementing constraint (transaction times along the path must strictly increase); a path scoring function is introduced to calculate the anomaly propagation strength for each candidate path. ; Where P represents a candidate path, P={e1,e2,…,e m}, where m is the number of edges in the path; This indicates that e is a directed edge in path P, corresponding to a specific transaction; Score the probability of anomalies for edge e; This represents the path length, the number of edges contained in path P, i.e., m; Represents the weight of all edges on the path. The product of , which represents the logarithm of the amount.
[0043] Next, path pruning and sorting are performed, retaining the top 3-5 highest-scoring paths and removing redundant and invalid paths; the selected paths are then converted into graph structure data for report generation.
[0044] A second aspect of the technical solution of the present invention provides a financial abnormal transaction tracing system based on multimodal fusion, comprising: The data acquisition module is used to collect multi-source heterogeneous financial data, including structured financial ledgers, unstructured invoice images, and contract texts. The multimodal processing module is used to perform OCR recognition on the invoice image to extract invoice information, perform NLP semantic parsing on the contract text to extract entities and relationships, and perform formatting processing on the structured financial ledger. The cross-modal alignment and fusion module is used to align information describing the same transaction entity from different modalities and generate a fused feature vector. The knowledge graph construction module is used to construct a financial spatiotemporal knowledge graph based on aligned multimodal data. The knowledge graph includes entity nodes, relation edges, and time attributes. An anomaly detection module is used to score transaction nodes in the knowledge graph for anomalies based on a time-series graph neural network. The anomaly tracing module is used to locate the source and generate the propagation path for nodes whose anomaly scores exceed a preset threshold.
[0045] The multimodal processing module includes an OCR recognition unit and an NLP parsing unit. The OCR recognition unit uses DBNet to detect text regions in the invoice image. This algorithm performs well in detecting curved and tilted text. The SVTR lightweight recognition network is used for text recognition, with specific optimizations for numbers, amounts, and dates. Finally, the BiLSTM-CRF model is used to extract key fields from the recognized text sequence, such as invoice code, invoice date, amount, and transacting party.
[0046] The NLP parsing unit first uses 20,000 to 100,000 real financial contracts to perform domain-adaptive pre-training on the BERT model to obtain the Financial-BERT model; then, the TPLinker model is used to extract entities and relations. TPLinker can handle the problem of overlapping entities and relations and is suitable for complex nesting of contract terms; the output is an entity-relation triple containing the contracting parties, amount, term, payment terms, etc.
[0047] The anomaly tracing module includes: The source identification unit uses backpropagation attribution or PageRank algorithm with time decay to calculate the contribution of each historical node to the current abnormal node. Nodes with high contribution are identified as abnormal sources. The path search unit uses a time-constrained depth-first search algorithm to search for all paths from the source to the anomalous node in the reverse time-series subgraph and calculates a score for each path. The path pruning and sorting unit retains the K highest-scoring paths.
[0048] The workflow of the financial anomaly transaction tracing system based on multimodal fusion is as follows: (1) Data collection and preprocessing: Synchronize structured ledgers from the financial system on a regular basis; newly uploaded invoice images enter the OCR queue; newly uploaded contract texts enter the NLP queue; (2) Multimodal feature extraction: The OCR module extracts key information from the invoices; the NLP module outputs entity-relation triples; and the structured ledger is directly formatted. (3) Cross-modal alignment: Explicit matching based on transaction ID and voucher number; fuzzy attribute alignment for unmatched data; semantic vector alignment; generation of fused feature vectors; (4) Knowledge graph construction: Import the aligned data into the database, construct nodes and edges and attach timestamps, and create an index; (5) Anomaly detection: Subgraph sampling is performed on newly added transaction nodes, and the TGAT model calculates the anomaly score. If the score exceeds 0.8, tracing is triggered.
[0049] (6) Anomaly tracing: Backpropagation to locate the source, time-series path search to generate propagation path, path scoring and sorting.
[0050] (7) Report generation: Integrate the results, generate anomaly descriptions in natural language, package multimodal evidence, and output a structured audit report.
[0051] A third aspect of the present invention provides a computer-readable storage medium having instructions stored thereon, which, when executed by one or more processors, cause the processors to perform the financial anomaly transaction tracing method based on multimodal fusion as described in the first aspect.
[0052] As can be seen from the above embodiments, the present invention can deeply integrate multimodal financial data, and break down the barriers between structured and unstructured data through cross-modal alignment and fusion, thereby achieving unified representation and joint reasoning.
[0053] As can be seen from the above embodiments, the present invention can capture the dynamic evolution of transaction time, construct a financial spatiotemporal knowledge graph containing time attributes, and use a time-series graph neural network to model the time series characteristics of fund flow, effectively identifying time-sensitive anomaly patterns.
[0054] As can be seen from the above embodiments, the present invention can achieve source location and path generation. By adopting backpropagation attribution and time-series path search algorithms, it can automatically identify the source of the anomaly and generate the propagation path, which greatly shortens the tracing time.
[0055] This embodiment can divide the method into functional modules based on the above method example. For example, each function can be assigned to a separate module, or two or more functions can be integrated into one processing module. The integrated module can be implemented in hardware. It should be noted that the module division in this embodiment is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods.
[0056] When dividing the functions into modules corresponding to each function, it may include: a data acquisition module, a multimodal processing module, a cross-modal alignment and fusion module, a knowledge graph construction module, an anomaly detection module, and an anomaly tracing module. It should be noted that all relevant content of each step involved in the above method embodiments can be referenced from the functional descriptions of the corresponding functional modules, and will not be repeated here.
[0057] This embodiment also provides a computer-readable storage medium (including but not limited to disk storage, CD-ROM, optical storage, etc.) storing computer program code. When the computer program code is run on a computer, the computer executes the above-mentioned related method steps to implement the financial abnormal transaction tracing method based on multimodal fusion provided in the above embodiment.
[0058] This embodiment also provides a computer program product. When the computer program product is run on a computer, it causes the computer to perform the aforementioned steps to implement the multimodal fusion-based method for tracing abnormal financial transactions provided in the above embodiment. The beneficial effects of the above embodiments can be found in the corresponding methods described above, and will not be repeated here.
[0059] Through the above description of the embodiments, those skilled in the art will understand that, for the sake of convenience and brevity, only the division of the above functional modules is used as an example. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.
[0060] In the embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For instance, the division of modules or units is merely a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms. In the description of this disclosure, it should be understood that if terms such as "upper," "lower," "front," "rear," "left," and "right" are used to indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, they are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the indicated position or element must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this disclosure.
[0061] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes the element.
[0062] The above are merely embodiments of this disclosure and are not intended to limit the scope of this disclosure. Various modifications and variations can be made to this disclosure by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the scope of the claims of this disclosure.
Claims
1. A method for tracing abnormal financial transactions based on multimodal fusion, characterized in that, Acquire multi-source heterogeneous financial data, including structured financial ledgers, unstructured invoice images, and contract texts; The invoice image is subjected to OCR recognition to extract invoice information, the contract text is subjected to NLP semantic parsing to extract entities and relationships, and the structured financial ledger is formatted. Align information from different modalities describing the same transaction entity and generate a fused feature vector; A financial spatiotemporal knowledge graph is constructed based on aligned multimodal data. The knowledge graph includes entity nodes, relation edges, and time attributes. Anomaly scoring is performed on transaction nodes in the knowledge graph based on a temporal graph neural network. For nodes with abnormal scores exceeding a preset threshold, the source is located and the propagation path is generated, and a structured report is produced.
2. The method for tracing abnormal financial transactions based on multimodal fusion according to claim 1, characterized in that, The contract text is subjected to NLP semantic parsing to extract entities and relations, including: The Financial-BERT model, pre-trained on a corpus in the financial domain, extracts semantic features from the text. The TPLinker model is then used to extract entities and relations simultaneously, outputting entity-relation triples that include the signatory, amount, and term.
3. The method for tracing abnormal financial transactions based on multimodal fusion according to claim 1, characterized in that, The step of aligning information from different modalities describing the same transaction entity and generating a fused feature vector includes: Explicit key alignment, matching based on transaction ID, voucher number, or invoice number; Fuzzy attribute alignment calculates weighted similarity of amount, date, and transacting party name when explicit keys are missing or inconsistent; Semantic vector alignment: Financial-BERT is used to generate text semantic vectors, which are then matched with nearest neighbors to the structured feature vectors. Finally, the features from each modality are fused through an attention mechanism to obtain a unified feature vector.
4. The method for tracing abnormal financial transactions based on multimodal fusion according to claim 1, characterized in that, The financial spatiotemporal knowledge graph is stored in the form of quadruples, represented as follows: Where s and o are the head and tail entity nodes, r is the relation edge, and t is the timestamp; entity node types include subject node, transaction node, voucher node, and account node.
5. The method for tracing abnormal financial transactions based on multimodal fusion according to claim 3, characterized in that, The system employs explicit key alignment, fuzzy attribute alignment, and semantic vector alignment, and integrates features from various modalities through an attention mechanism.
6. The method for tracing abnormal financial transactions based on multimodal fusion according to claim 1, characterized in that, The temporal graph neural network is TGAT. The node feature aggregation function is obtained through the temporal graph neural network. Based on the node feature aggregation function, the feature representation vector of each transaction node is obtained. The feature is then input into the anomaly scoring network to obtain the anomaly probability.
7. The method for tracing abnormal financial transactions based on multimodal fusion according to claim 6, characterized in that, The node feature aggregation function is expressed as follows: ;in, This represents the feature representation of node v at layer l. Represents the set of neighbors of node v; Indicates the attention coefficient; represents the temporal embedding of the transaction edge; W represents the learnable weight matrix.
8. A financial anomaly transaction tracing system based on multimodal fusion, characterized in that, include: The data acquisition module is used to collect multi-source heterogeneous financial data, including structured financial ledgers, unstructured invoice images, and contract texts. The multimodal processing module is used to perform OCR recognition on the invoice image to extract invoice information, perform NLP semantic parsing on the contract text to extract entities and relationships, and perform formatting processing on the structured financial ledger. The cross-modal alignment and fusion module is used to align information describing the same transaction entity from different modalities and generate a fused feature vector. The knowledge graph construction module is used to construct a financial spatiotemporal knowledge graph based on aligned multimodal data. The knowledge graph includes entity nodes, relation edges, and time attributes. An anomaly detection module is used to score transaction nodes in the knowledge graph for anomalies based on a time-series graph neural network. The anomaly tracing module is used to locate the source and generate the propagation path for nodes whose anomaly scores exceed a preset threshold.
9. A financial anomaly transaction tracing system based on multimodal fusion according to claim 8, characterized in that, The anomaly tracing module includes: The source identification unit uses backpropagation attribution or PageRank algorithm with time decay to calculate the contribution of each historical node to the current abnormal node. Nodes with high contribution are identified as abnormal sources. The path search unit uses a time-constrained depth-first search algorithm to search for all paths from the source to the anomalous node in the reverse time-series subgraph and calculates a score for each path. The path pruning and sorting unit retains the K highest-scoring paths.
10. A computer-readable storage medium, characterized in that, It stores instructions that, when executed by one or more processors, cause the processors to perform the financial anomaly transaction tracing method based on multimodal fusion as described in any one of claims 1-7.