A material whole life cycle management system based on financial underlying data
By constructing a materials lifecycle management system based on underlying financial data, the problem of disordered material lifecycle nodes has been solved, the accurate restoration of material lifecycle paths and anomaly identification have been achieved, and the precision of supervision and risk identification capabilities have been improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JINGHANG IND TECHNOLOGY (SHANDONG) CO LTD
- Filing Date
- 2026-03-04
- Publication Date
- 2026-06-19
AI Technical Summary
In large enterprises and research institutions, there are a large number of specialized materials with small value but complex circulation paths that are not managed as independent asset cards in the financial system. This leads to disorder or missing nodes in the material life cycle, making them difficult to identify using traditional methods and affecting turnover rate assessment and compliance supervision of special funds.
A full lifecycle management system for materials is built based on underlying financial data. Through a financial data collection module, a semantic fingerprint generation module, a financial behavior graph construction module, and a lifecycle path reconstruction module, a unique lifecycle master path is generated and abnormal sub-paths are identified. Risk assessment is conducted using the lifecycle integrity index.
It has achieved a structured reconstruction of the material lifecycle path, significantly improved the accuracy and completeness of lifecycle reconstruction, enhanced the sensitivity and accuracy of anomaly identification, and provided reliable data support for refined supervision and risk warning.
Smart Images

Figure CN122243380A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of financial data analysis and materials management technology, specifically to a materials lifecycle management system based on underlying financial data. Background Technology
[0002] In large corporations, research institutions, and scenarios involving multiple projects running concurrently, there exists a large amount of specialized materials with relatively small monetary value but complex circulation paths, such as testing fixtures, phased testing modules, and cross-project turnover components. These materials are typically recorded in the financial system as expenses, low-value consumables, or construction-in-progress materials, without forming an independent asset card management chain. Furthermore, different business systems only perform summary-level data exchange, lacking correlation modeling of underlying financial data such as accounting vouchers, journal entries, auxiliary accounting items, and project numbers. This leads to the same materials being misrepresented in cross-project transfers and returns. Hidden misalignments arise during the re-issuance or accounting reversal process, such as "continuous amounts but broken status" or "physical flow exists but accounting path is missing." Especially in scenarios involving high frequency, small amounts, and multiple batches of split accounting entries, materials are dispersed and recorded under different cost objects, resulting in disordered lifecycle node sequences or missing nodes. These issues are difficult to identify through traditional inventory counts or financial reconciliation methods, thus affecting turnover rate assessment, residual value determination, and compliance supervision of special funds. Therefore, how to construct a unified lifecycle master line for materials based on underlying financial data and identify abnormal nodes has become an urgent technical problem to be solved. Summary of the Invention
[0003] The purpose of this invention is to provide a material lifecycle management system based on underlying financial data to address the shortcomings of the prior art.
[0004] To achieve the above objectives, the present invention provides the following technical solution: a materials lifecycle management system based on underlying financial data, comprising:
[0005] Financial data acquisition module: Collects underlying financial voucher data within a preset period and constructs the original journal entry dataset F;
[0006] Semantic fingerprint generation module: performs semantic splitting and multi-field cross-coding on the original journal entry dataset F to generate material semantic fingerprint M;
[0007] Financial Behavior Graph Construction Module: Using the material semantic fingerprint M as the main index, the entries are topologically sorted by timestamp to construct a directed weighted financial behavior graph G containing the weight of the amount flow.
[0008] Lifecycle path reconstruction module: Based on the preset lifecycle constraint matrix, the directed weighted financial behavior graph G is folded and mapped to generate a unique lifecycle main path P, and abnormal sub-paths that do not meet the constraint matrix are identified.
[0009] Results output module: Based on the abnormality of the monetary closure and time interval of the abnormal sub-path, calculate the life cycle integrity index and output the material life cycle assessment results including the abnormality type and risk level.
[0010] Preferably, the semantic fingerprint generation module includes the following steps:
[0011] The material name field in the original journal entry dataset F is segmented and word vectors are mapped to extract the core semantic feature vectors of the material. Synonyms and words with different specifications are semantically merged to generate standardized material semantic units.
[0012] The standardized material semantic unit is matched with the supplier code, project number and cost center code in multiple fields to construct a multi-dimensional feature combination matrix, and the association weight coefficient between each field is calculated.
[0013] Based on the multidimensional feature combination matrix and the associated weight coefficients, a feature weighted encoding operation is performed to generate a unique material semantic feature sequence;
[0014] The semantic feature sequence of the material is hash-mapped to generate a semantic fingerprint M.
[0015] Preferably, the semantic feature sequence of the material is subjected to hash mapping processing to generate a semantic fingerprint M of the material, including the following steps:
[0016] The semantic feature sequence of the materials is concatenated according to a preset field order to form a standardized feature string, and the field boundaries are fixed with separators.
[0017] The standardized feature string is length-checked and abnormal characters are filtered to generate a standardized encoded input string. The encoded input string is then marked with a version number to distinguish the feature structures generated by different rules.
[0018] The normalized encoded input string is input into a preset hash mapping algorithm for irreversible mapping operation to generate a fixed-length encoded result, and the duplicate encoding is subjected to secondary perturbation processing by a collision detection method.
[0019] The fixed-length encoding result after conflict verification is determined as the material semantic fingerprint M.
[0020] Preferably, the financial behavior graph construction module includes the following steps:
[0021] Using the semantic fingerprint M of materials as the filtering condition, the corresponding set of entries is extracted from the original entry dataset and sorted from earliest to latest according to the timestamp to form a time-ordered entry sequence. At the same time, the entries under the same timestamp are sorted in a secondary order according to the entry number.
[0022] Based on the time-ordered journal entry sequence, a pairing relationship for amount transmission is constructed according to the direction of lending and borrowing. Debit entries are regarded as nodes of amount inflow and credit entries are regarded as nodes of amount outflow. Initial directed connection edges are established according to the matching principle of equal amount or minimum difference.
[0023] For each directed connection edge, calculate the amount flow weight value, which is determined by the proportion of the amount of the connection edge to the total amount of the current node;
[0024] By treating the nodes where funds flow in and out as graph nodes, and the normalized directed edges and their weights as graph edge attributes, a directed weighted financial behavior graph G containing the weights of the flow of funds is constructed.
[0025] Preferably, in the directed weighted financial behavior graph G, nodes represent implicit points of business status, and edges represent monetary transmission relationships.
[0026] Preferably, the lifecycle path reconstruction module includes the following steps:
[0027] Construct a lifecycle constraint matrix, which uses business status type as the row and column dimensions. The matrix elements are used to represent whether a monetary transmission relationship is allowed between two business statuses and the threshold of the allowed transmission time interval. Each business status type is predefined according to the account code and the debit / credit direction.
[0028] Each node in the directed weighted financial behavior graph is mapped to the corresponding business status type according to the account code and the debit / credit direction to form a status mapping sequence, and the status mapping sequence is substituted into the life cycle constraint matrix for legality verification.
[0029] For paths that satisfy the lifecycle constraint matrix rules and have continuously closed financial flow weight values, path folding is performed to merge consecutive legal state nodes into a single lifecycle stage node, generating a candidate lifecycle path set.
[0030] The path with the largest cumulative weight and the fewest state transitions from the candidate lifecycle path set is selected as the unique lifecycle main path, and the remaining paths that do not meet the lifecycle constraint matrix or are not folded and absorbed are identified as abnormal sub-paths.
[0031] Preferably, the result output module includes the following steps:
[0032] The quantitative calculation of the amount closure of abnormal sub-paths is carried out by taking the sum of the amount flow weight values of all directed connection edges in the abnormal sub-path as the cumulative weight value of the path, and then performing a ratio calculation between the cumulative weight value of the path and the cumulative weight value of the corresponding life cycle main path to obtain the amount deviation coefficient.
[0033] Calculate the time interval anomaly value corresponding to each illegal state transition in the abnormal sub-path. Specifically, subtract the corresponding time interval threshold in the life cycle constraint matrix from the actual time interval, and sum all the excess difference to form the total time excess value.
[0034] After normalizing the amount deviation coefficient and the total value of time exceeding the limit, a life cycle integrity index is generated.
[0035] Risk levels are determined based on the range of the life cycle integrity index. When the life cycle integrity index is less than 0.6, it is determined to be a high-risk level; between 0.6 and 0.85, it is determined to be a medium-risk level; and greater than 0.85, it is determined to be a low-risk level. The life cycle assessment results of materials, including the anomaly type and risk level, are output.
[0036] Preferably, after normalizing the amount deviation coefficient and the total time excess value, a lifecycle integrity index is generated, including:
[0037] The normalized amount deviation coefficient and the total time limit excess value are converted into a comprehensive feature vector. The comprehensive feature vector is used as the input of the machine learning model. The machine learning model uses the prediction of the life cycle integrity index label for each set of comprehensive feature vectors as the prediction objective and minimizes the sum of prediction errors for all life cycle integrity index labels as the training objective. The machine learning model is trained until the sum of prediction errors converges and the model training stops. The life cycle integrity index is determined based on the model output. The machine learning model is a multinomial regression model.
[0038] The technical effects and advantages provided by the present invention in the above technical solution are as follows:
[0039] 1. This invention achieves a structured reconstruction of the entire lifecycle path of digital assets based on underlying financial journal entry data by constructing a semantic fingerprint of materials, establishing a directed weighted financial behavior graph, and introducing a lifecycle constraint matrix. Compared to traditional methods that rely on inventory systems or aggregated financial data for post-event reconciliation, this invention can depict the monetary transmission relationship and state transition logic at the journal entry level, solving the problem of "continuous monetary amounts but broken states" caused by materials being split for accounting, cross-accounting, and cross-project transfers. Through path folding and legality verification, the main lifecycle path can be uniquely determined and abnormal sub-paths can be identified, thereby significantly improving the accuracy and completeness of lifecycle reconstruction.
[0040] 2. This invention constructs a lifecycle integrity index and utilizes a multinomial regression model to nonlinearly fuse the deviation coefficient of the amount and outliers of the time interval, thereby achieving a quantitative assessment and risk classification of the degree of anomaly. Compared with fixed-weight linear scoring methods, this invention can characterize the interaction between amount disturbances and time limits, maintaining high identification accuracy and stability even in small-amount, high-frequency, and highly concealed anomaly scenarios. It significantly improves the sensitivity and accuracy of identifying risks in the lifecycle of digital assets, providing reliable data support for refined supervision and risk warning. Attached Figure Description
[0041] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this invention. For those skilled in the art, other drawings can be obtained based on these drawings.
[0042] Figure 1 This is a flowchart of the system modules of the present invention. Detailed Implementation
[0043] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0044] For examples, please refer to Figure 1 As shown in this embodiment, a materials lifecycle management system based on underlying financial data includes:
[0045] Financial data acquisition module: Collects underlying financial voucher data within a preset period and constructs the original journal entry dataset F.
[0046] In one embodiment of the present invention, the underlying financial voucher data generated by the target unit within a preset period is first collected to construct the basic dataset required for subsequent lifecycle modeling. The preset period can be a calendar month, a calendar quarter, an accounting year, or a time interval customized according to specific management needs, such as the execution cycle of a research project or the construction cycle of an engineering project. The present invention does not limit the length of the period, but it must ensure that the financial data within the period has complete debit-credit closure characteristics.
[0047] The underlying financial voucher data is original accounting entry-level data, distinct from summary report data or general ledger balance data, and includes at least the following fields:
[0048] The entry number is used to uniquely identify different entry lines in the same accounting document, serving as the basic unit for subsequently constructing monetary flow relationships.
[0049] Account codes, including primary account codes and detailed account codes, are used to reflect the nature of the business and the direction of the amount attribution;
[0050] The debit or credit direction indicates whether the entry is a debit or credit entry, thus determining whether the amount is inflow or outflow.
[0051] Auxiliary accounting items should include at least the project number, cost center code, supplier code, material name field, or other business identification field to enhance the accuracy of material identification.
[0052] The amount field is used to indicate the actual amount incurred for this journal entry;
[0053] Timestamp information, including voucher date and posting time, is used to construct time-series relationships.
[0054] In terms of data acquisition methods, it can be achieved through any of the following methods:
[0055] 1. Real-time data capture via the financial system's open API;
[0056] 2. Extract entry-level data by field structure through direct database connection;
[0057] 3. Batch import using standardized data exchange files (such as XML, CSV, or structured data messages);
[0058] Fourth, data can be obtained through unified scheduling via a financial shared services platform or data middleware platform.
[0059] To ensure data integrity, a unique data identifier ID is generated for each journal entry during the data collection process, and the balance of debit and credit amounts is verified. If any discrepancies in debit and credit amounts or missing fields are found on a voucher, the voucher is marked as abnormal data and is temporarily excluded from the original journal entry dataset.
[0060] After the data collection is completed, all journal entries that meet the field integrity requirements are standardized according to a unified field structure. This includes standardizing field naming, converting time formats, standardizing monetary units, and setting rules for filling null values, thereby forming a unified and computable original journal entry dataset F.
[0061] The original journal entry dataset F serves as the foundational data input for subsequent generation of material semantic fingerprints, construction of financial behavior graphs, and reconstruction of lifecycle paths. Its data granularity is maintained at the "journal entry level" rather than the voucher level or account summary level to ensure that it can reflect the smallest structural unit of monetary transmission.
[0062] Through the above steps, this invention achieves refined collection and structured construction of underlying financial data, providing a data foundation for reconstructing the true lifecycle path of materials based on financial data.
[0063] Semantic fingerprint generation module: Performs semantic splitting and multi-field cross-coding on the original journal entry dataset F to generate material semantic fingerprint M.
[0064] In one embodiment of the present invention, the material name field in the original journal entry dataset F is first processed by word segmentation and word vector mapping. Specifically, a special material domain dictionary is constructed, which consists of entries that appear more than 10 times in historical journal entries and is manually verified to form a standard lexicon. Then, a word segmentation algorithm based on a statistical language model is used to split the material name field into several word units. For each word unit, a word vector model pre-trained from historical material name corpus is used for vectorization. The word vector model is trained by minimizing the context prediction error function so that the Euclidean distance between semantically similar words in the vector space is less than a preset semantic distance threshold of 0.3. When the Euclidean distance between two word units is less than the semantic distance threshold of 0.3, they are merged into the same standard semantic category. At the same time, words with specification differences are split and recombined according to the rule of "main model + specification parameters" to finally generate standardized material semantic units.
[0065] After generating standardized material semantic units, the standardized material semantic units are associated and matched with supplier codes, project numbers, and cost center codes through multi-field association. Specifically, a multi-dimensional feature combination matrix is constructed with the standardized material semantic units as the row dimension and the supplier codes, project numbers, and cost center codes as the column dimensions. For each dimension field, the co-occurrence probability value with the standardized material semantic units is calculated, and the field weight coefficient is calculated through a field discrimination function. The field weight coefficient is normalized by the reciprocal of the frequency of the field in the entire catalog, so that high-discrimination fields obtain larger weight values, thereby forming a multi-dimensional feature combination matrix containing field weight coefficients.
[0066] Based on the multidimensional feature combination matrix and field weight coefficients, a feature weighted encoding operation is performed. Specifically, the vector value of the standardized material semantic unit is multiplied by the corresponding field weight coefficient, and the weighted results of each field are summed to generate an ordered sequence of feature values. Then, the feature value sequence is arranged according to a preset field order to form a unique material semantic feature sequence, wherein the field order is fixed as standardized material semantic unit, supplier code, project number, and cost center code.
[0067] After obtaining the semantic feature sequence of the materials, the semantic feature sequence of the materials is concatenated according to the preset field order to form a standardized feature string, and a fixed separator "#" is inserted between each field to solidify the field boundary position and prevent structural ambiguity caused by different field combinations.
[0068] Subsequently, the standardized feature string is subjected to length verification and abnormal character filtering. Specifically, it is determined whether the string length is within the preset length range of 8 to 128; if it is less than 8 or greater than 128, it is marked as abnormal data and removed; at the same time, non-numeric, non-alphanumeric, and non-separator characters are deleted to generate a standardized encoded input string; a rule version number field is inserted at the beginning of the standardized encoded input string to distinguish the feature structures generated by different encoding rules.
[0069] The normalized encoded input string is input into a preset hash mapping algorithm for irreversible mapping operations. The hash mapping algorithm uses a fixed-length digest algorithm, and its output length is fixed at 32 hexadecimal characters. After generating the fixed-length encoded result, the encoded result is compared with the generated encoding set. If there is a duplicate encoding, an incrementing perturbation value of 1, 2, and 3 is added to the end of the normalized encoded input string, and the hash operation is repeated until a non-duplicate encoding is generated, thereby completing the collision elimination.
[0070] Finally, the fixed-length encoding result after conflict verification is determined as the material semantic fingerprint M, and the material semantic fingerprint M is written into the corresponding record position of the original journal dataset F, which is used as the unique master index identifier for the subsequent construction of the directed weighted financial behavior graph.
[0071] Financial Behavior Graph Construction Module: Using the material semantic fingerprint M as the main index, the entries are topologically sorted by timestamp to construct a directed weighted financial behavior graph G containing the weight of the amount flow.
[0072] In one embodiment of the present invention, firstly, using the material semantic fingerprint M as the sole filtering condition, all journal entries whose material semantic fingerprint value is equal to M are retrieved from the original journal entry dataset, and the retrieval results are written to a temporary data table. Subsequently, the journal entries in this temporary data table are sorted: the first sorting key is the timestamp, arranged in ascending order of time value; the second sorting key is the journal entry number, arranged in ascending order of integer value of the journal entry number; if both the timestamp and the journal entry number are the same, they are arranged according to the character order of the voucher number to eliminate sorting ambiguity. After sorting, a time-ordered journal entry sequence is generated, and each journal entry in this sequence is assigned a consecutive sequence number 1, 2, 3…n for subsequent location and reference.
[0073] When constructing the monetary pairing relationship, firstly, all credit entries in the time-ordered journal entry sequence are traversed, each credit entry is marked as a monetary outflow node, and its monetary value is recorded as the amount to be matched; then, debit entries that have not yet been fully matched are searched within its subsequent time range, and the matching operation is performed according to the following steps:
[0074] The first step is to determine whether there is a debit entry with an amount exactly equal to the amount to be matched; if so, a directed connection edge is created from the credit entry to the debit entry, and both entries are marked as matched.
[0075] The second step is to calculate the absolute value of the difference between the amount of all candidate debit entries and the amount to be matched, and select the debit entry with the smallest difference. If the smallest difference is not greater than the preset difference threshold of 5, then a directed connection edge is established.
[0076] The third step is to accumulate multiple debit entries in chronological order if the amount of a single debit entry is less than the amount to be matched, until the accumulated amount equals or exceeds the amount to be matched for the first time. When the accumulated amount exceeds the amount to be matched, the last debit entry is split into a matched part and a remaining part. The matched part is equal to the amount to be matched minus the accumulated amount of the previous sequence, and the remaining part is retained in the original sequence for subsequent matching, thus forming multiple amount splitting connection edges.
[0077] After forming all initial directed edges, calculate the total amount of each outflow node by summing the amounts of all edges corresponding to that node. Then, divide the amount of an edge by the total amount of that node to obtain the weight value of the flow of funds. The weight value of the flow of funds is retained to 4 decimal places. If the sum of the weight values of all edges of the same node is not equal to 1 due to the retention of precision, the edge with the largest weight value is compensated to increase or decrease the error value, thereby ensuring that the total weight of the same node is exactly 1.
[0078] Finally, all inflow and outflow nodes are written into a node set table, with each node containing a node number, entry number, timestamp, and lending / borrowing direction attribute. All directed edges are written into an edge set table, with each edge containing a starting node number, an ending node number, and a weight value for the flow of funds. A directed weighted financial behavior graph G, incorporating the weights of the flow of funds, is constructed using the node and edge sets. Nodes represent implicit points of business states formed based on entries, and edges represent the transmission relationship of funds between different business states. This directed weighted financial behavior graph G provides a complete structured data foundation for subsequent lifecycle path folding and state mapping.
[0079] Lifecycle path reconstruction module: Based on the preset lifecycle constraint matrix, the directed weighted financial behavior graph G is folded and mapped to generate a unique lifecycle main path P, and abnormal sub-paths that do not meet the constraint matrix are identified.
[0080] In one embodiment of the present invention, a lifecycle constraint matrix is first constructed. The specific process is as follows:
[0081] The first step is to classify and statistically analyze the status of all historical entries in the original journal entry dataset according to account codes and debit / credit directions. Semantic categorization is performed for each account code category. For example, debit accounts such as raw materials and inventory are defined as "inbound" status; credit accounts such as management expenses and production costs are defined as "requisition" status; combinations of internal accounts are defined as "transfer" status; credit accounts such as non-operating expenses or asset impairment are defined as "scrapping" status; and entries such as accumulated depreciation or clearing are defined as "write-off" status. Each business status type is assigned a unique integer number: 1, 2, 3, 4, or 5.
[0082] The second step is to statistically analyze the actual transfers between different business status types in historical data. For each material semantic fingerprint corresponding to a time-ordered entry sequence, record the number of transfers between two adjacent business status types and calculate the time interval in days between the occurrence of the previous and subsequent statuses. For status transfers of the same type, calculate the arithmetic mean of the time intervals and the standard deviation. Add twice the standard deviation to the mean as the time interval threshold for that status transfer. For example, if the average interval from the warehousing status to the requisition status is 120 days and the standard deviation is 40 days, then the time interval threshold is set to 200 days.
[0083] The third step is to construct a two-dimensional matrix. The rows and columns of the matrix are business state type numbers. When a certain state transition occurs 5 or more times in historical statistics, the state transition permission flag at that position is set to 1; otherwise, it is set to 0. Simultaneously, the corresponding time interval threshold is entered into the matrix cell. This forms a complete lifecycle constraint matrix.
[0084] After constructing the lifecycle constraint matrix, state mapping is performed on each graph node in the directed weighted financial behavior graph. Specifically, the account code and debit / credit direction of the corresponding entry for each graph node are read, the business state type number is determined according to predefined state classification rules, and a state mapping sequence is generated according to the path order in the graph. Simultaneously, the timestamps of each node are extracted for subsequent time verification.
[0085] The validity of the state mapping sequence is then verified. For the i-th state and the (i+1)-th state in the sequence, the matrix elements in the corresponding row and column of the lifecycle constraint matrix are searched. If the state transition permission flag is 1, the interval in days between the timestamps of the two nodes is further calculated; if the interval in days is less than or equal to the corresponding time interval threshold, the transition is considered valid; if it is greater than the threshold or the state transition permission flag is 0, it is considered an illegal state transition, and an abnormal location number is marked in the path.
[0086] For a continuous path segment consisting of all valid state transitions, path folding is performed. Specifically: starting from the path's origin, the weight values of the monetary flow of each directed edge on the path are accumulated; when the accumulated weight value reaches 0.95 or higher, the monetary flow of the path segment is considered essentially closed; multiple state nodes within the path segment are merged into a single lifecycle stage node. This lifecycle stage node records the start time as the time of the first node in the path segment and the end time as the time of the last node in the path segment, with the accumulated monetary weight value being the sum of the weight values of all connected edges in the path segment. All paths are traversed according to the above rules to generate several candidate lifecycle paths.
[0087] Finally, the candidate lifecycle paths are screened. First, the total cumulative weight value of each candidate path is calculated; second, the number of lifecycle stage nodes contained in the path is counted, and the number of state transitions is calculated as the number of stage nodes minus 1; the paths are sorted from largest to smallest according to their cumulative weight value. When the difference between the cumulative weight values of two paths is less than 0.01, the path with fewer state transitions is selected first; if they are still the same, the path with the earliest start time is selected. The selected result is determined as the unique primary lifecycle path, and the remaining unselected paths or paths containing illegal state transitions are determined as abnormal sub-paths, and the abnormality type is recorded as "illegal state transition" or "time interval exceeded".
[0088] To verify the beneficial effects of the lifecycle path reconstruction steps, the financial underlying journal entries of three units under a certain group over a continuous 12-month period were selected as test samples. The total data consisted of 218,764 journal entries, of which 3,412 were valid material semantic fingerprints after material semantic fingerprinting.
[0089] The results of constructing the lifecycle constraint matrix include:
[0090] Statistical analysis of historical journal entries over the past three years yielded the following partial statistical results of state transitions:
[0091] Warehouse entry status → Requisition status: 1,286 occurrences, average time interval 138 days, standard deviation 52 days, therefore the time interval threshold is set to 242 days;
[0092] Requisition status → Scrap status: Occurred 624 times, with an average time interval of 910 days and a standard deviation of 310 days. Therefore, the time interval threshold is set to 1,530 days.
[0093] Warehouse entry status → scrapped status: This occurred 7 times, which is less than twice the preset occurrence threshold of 5. Therefore, the status transition permission flag is set to 0.
[0094] This completes the lifecycle constraint matrix, defining 9 types of legal state transition relationships and 16 types of illegal state transition relationships.
[0095] One type of high-frequency, low-value test material was selected as the test object. This type of material contains 742 journal entries and involves 48 semantic fingerprints of the material.
[0096] When the method of this invention is not used: the path is reconstructed using the traditional time-series concatenation method, connecting entries only according to time sequence:
[0097] Number of build paths: 112;
[0098] The proportion of paths with obvious state reversals is 38.4%.
[0099] The proportion of paths with a closure factor below 0.9 was 42.1%.
[0100] Number of abnormal sub-paths that could not be identified: 0.
[0101] Because state validity constraints and path folding are not implemented, the system cannot distinguish between legitimate transitions and abnormal jumps.
[0102] After using the method of this invention:
[0103] Validity is verified using the lifecycle constraint matrix, and path folding is performed:
[0104] Number of candidate lifecycle paths constructed: 64;
[0105] Number of unique lifecycle master paths generated: 48 (consistent with the number of material semantic fingerprints);
[0106] Number of abnormal sub-paths: 16;
[0107] Anomaly detection accuracy (verified by manual sampling): 92.6%;
[0108] The proportion of main paths with a closure rate of 0.95 or higher: 95.8%;
[0109] The identified abnormal sub-paths mainly include:
[0110] There are 9 instances of illegal status transfers (e.g., direct write-off of inventory status);
[0111] There are 7 instances of time interval exceeding the limit (more than 242 days from warehousing to use).
[0112] As can be seen from the data in the above embodiments: the present invention filters out illegal state transitions through the life cycle constraint matrix, reducing the proportion of reverse state paths from 38.4% to 0%; through path folding and amount weight closure determination, the closure degree of the main path amount is significantly improved, from less than 0.9 to more than 0.95; it can identify 16 abnormal sub-paths that cannot be identified by traditional time serialization methods, accounting for 25% of the total number of paths; in the case of materials with small amounts and high frequency, it can still maintain an anomaly identification accuracy of 92.6%, proving that the present invention has an effective ability to identify hidden anomalies.
[0113] Therefore, by constructing a life cycle constraint matrix and combining it with a path folding algorithm, this invention achieves the unique determination of the main path of the material life cycle and the identification of abnormal sub-paths, significantly improving the accuracy of path reconstruction and the ability to detect anomalies, and solving the technical defect of traditional methods that cannot identify continuous monetary amounts but broken states.
[0114] Results output module: Based on the abnormality of the monetary closure and time interval of the abnormal sub-path, calculate the life cycle integrity index and output the material life cycle assessment results including the abnormality type and risk level.
[0115] In one embodiment of the present invention, the monetary closure degree of the abnormal sub-path is first quantitatively calculated.
[0116] Specifically, the following steps are taken: For each directed edge in the abnormal sub-path, its amount flow weight value is read and summed to obtain the path cumulative weight value W1; simultaneously, the cumulative weight value W0 of the corresponding lifecycle main path is read. The cumulative weight value of the lifecycle main path is defined as the sum of the amount flow weight values of all connected edges in the main path, and W0 is always equal to 1; then, the amount deviation coefficient K1 is calculated by dividing the path cumulative weight value W1 by the lifecycle main path cumulative weight value W0, i.e., using the proportion of the amount in the abnormal sub-path to the amount in the main path as the amount deviation coefficient.
[0117] For example, when the sum of the weights of the connecting edges of the abnormal sub-paths is 0.18, the amount deviation coefficient is 0.18. The larger the amount deviation coefficient, the stronger the disturbance of the abnormal path to the overall amount structure.
[0118] Next, the outlier time interval values in the abnormal sub-paths are calculated. The specific steps are as follows:
[0119] The first step is to read the timestamp Ti of the previous node and the timestamp T(i+1) of the next node for each illegal state transition in the abnormal sub-path, and calculate the actual time interval ΔTi in days.
[0120] The second step is to find the corresponding time interval threshold θi for state transition from the life cycle constraint matrix;
[0121] The third step is to subtract the time interval threshold θi from the actual time interval ΔTi. If the result is positive, it is recorded as the time over-limit difference Di for the state transition; if it is negative, it is recorded as 0.
[0122] The fourth step is to sum up all the time overrun differences Di in the abnormal sub-paths to obtain the total time overrun value S.
[0123] For example, if there are 3 illegal state transitions in an abnormal subpath, with time overrun differences of 20 days, 45 days, and 0 days respectively, then the total time overrun is 65 days.
[0124] After obtaining the amount deviation coefficient K1 and the total time over-limit value S, normalization is performed on them.
[0125] Specifically: Calculate the maximum value Kmax and minimum value Kmin of the amount deviation coefficient in all abnormal sub-path samples; perform the following processing on the amount deviation coefficient K1 of the current abnormal sub-path: subtract Kmin from K1, then divide by Kmax minus Kmin, so that it falls into the range of 0 to 1;
[0126] Similarly, the maximum value Smax and minimum value Smin of the total time overrun are calculated; the same processing is performed on the current total time overrun value S, so that the normalized time overrun value falls into the range of 0 to 1.
[0127] After normalization, a comprehensive feature vector V is formed, containing two values. The first element is the normalized amount deviation coefficient, and the second element is the normalized total time excess value. This comprehensive feature vector V is used as the input variable for a multinomial regression model. The multinomial regression model is constructed as follows: it adopts a second-order polynomial structure, and the input variables include: the normalized amount deviation coefficient X1, the normalized total time excess value X2, the squared term of X1, the squared term of X2, and the product term of X1 and X2, for a total of five independent variables. The output variable is the life cycle integrity index Y.
[0128] The model's structure is as follows: the life cycle integrity index is obtained by multiplying the above 5 independent variables by their corresponding regression coefficients, summing the results, and then adding a constant term.
[0129] The model training steps are as follows:
[0130] The first step is to select no fewer than 500 abnormal sub-path samples that have been manually reviewed and labeled with the lifecycle integrity index as training data;
[0131] The second step is to initialize all regression coefficients to 0;
[0132] The third step is to construct the loss function, which is defined as the sum of squares of the differences between the predicted lifecycle integrity index and the manually labeled value in all samples.
[0133] The fourth step is to update the regression coefficients using the batch gradient descent algorithm. In each iteration, the gradient values of all samples are calculated and the coefficients are updated.
[0134] The fifth step is to determine that the training has converged and stop training when the loss function decreases by less than 0.001 in 10 consecutive iterations.
[0135] After training is complete, the comprehensive feature vector of the current abnormal sub-path is input into the model to obtain the predicted lifecycle integrity index.
[0136] Finally, risk levels are determined based on the lifecycle integrity index. The risk level classification criteria are as follows:
[0137] When the life cycle integrity index is less than 0.6, it is defined as a high-risk level;
[0138] When the life cycle integrity index is greater than or equal to 0.6 and less than 0.85, it is defined as a medium-risk level;
[0139] When the life cycle integrity index is greater than or equal to 0.85, it is defined as a low-risk level.
[0140] The output results include: 1) material semantic fingerprint, 2) abnormal sub-path number, 3) amount deviation coefficient value, 4) total time limit exceedance value, 5) life cycle integrity index, 6) risk level, and 7) abnormal type classification identifier.
[0141] Through the above-mentioned quantitative calculation and model prediction methods, this invention can transform the impact of abnormal paths, which is originally difficult to quantify, into a calculable life cycle integrity index, and combine it with risk level to achieve structured output, thereby improving the accuracy and stability of anomaly identification.
[0142] To verify the beneficial effects of the life cycle integrity index and risk level classification, the financial underlying journal entries of a group company over the past 24 months were selected as the verification sample. The total data consisted of 426,315 journal entries. After material semantic fingerprinting, 5,872 valid material semantic fingerprints were formed, among which 1,146 abnormal sub-paths were identified.
[0143] Manually labeled sample construction: 600 abnormal sub-paths were randomly selected from 1,146 abnormal sub-paths, and the risk level was independently assessed by 3 financial personnel with more than 5 years of auditing experience. The majority opinion was used as the manually labeled life cycle integrity index label.
[0144] The distribution of manually annotated results is as follows:
[0145] High-risk level: 182 items;
[0146] Medium risk level: 263 items;
[0147] Low risk level: 155 items;
[0148] The aforementioned 600 samples were used as training and validation data for the multinomial regression model.
[0149] Model training results:
[0150] 500 samples were used as training data and 100 samples were used as test data.
[0151] The model training process iterated 148 times, and training stopped when the loss function decreased by less than 0.001 for 10 consecutive times.
[0152] The final sum of squared training errors was 0.083, and the mean absolute error of the test set was 0.037.
[0153] Compared with the fixed-weight scoring method:
[0154] For the same test data, a traditional fixed-weight scoring method was used for comparison. The traditional method sets the weight of the amount deviation coefficient to 0.6 and the weight of the total time limit exceedance value to 0.4, and obtains the score value through linear weighting.
[0155] The comparison results are as follows:
[0156]
[0157] It can be seen that the method of the present invention, compared with the fixed-weight scoring method:
[0158] The overall accuracy of risk level identification improved by 11.8 percentage points; the recall rate for identifying high-risk abnormal paths improved by 15.3 percentage points; and the false positive rate decreased by 6.7 percentage points.
[0159] Further, high-frequency materials with an amount of less than 5,000 yuan but more than 10 entries were selected as special test objects, totaling 216 abnormal sub-paths.
[0160] In this type of covert scenario involving "small amounts and high frequency":
[0161] The fixed-weight scoring method achieved an accuracy rate of 74.5%.
[0162] The method of this invention has an accuracy rate of 90.3%.
[0163] This demonstrates that the present invention has a more significant advantage in identifying concealed anomalies.
[0164] As can be seen from the data in the above examples: the life cycle integrity index can comprehensively quantify the disturbance of the monetary structure and the degree of time anomaly; through nonlinear fitting using a multinomial regression model, it can effectively characterize the interaction between monetary amount and time; it maintains a high recognition accuracy even in complex splitting of accounts and small-amount, high-frequency scenarios; and compared to the linear fixed-weight method, it significantly improves the accuracy and stability of risk classification.
[0165] Therefore, by introducing a life cycle integrity index and a multinomial regression model, this invention achieves accurate quantification of the risk level of abnormal paths, significantly improving the reliability and technical effectiveness of material life cycle assessment.
[0166] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application.
Claims
1. A materials lifecycle management system based on underlying financial data, characterized in that: include: Financial data acquisition module: Collects underlying financial voucher data within a preset period and constructs the original journal entry dataset F; Semantic fingerprint generation module: performs semantic splitting and multi-field cross-coding on the original journal entry dataset F to generate material semantic fingerprint M; Financial Behavior Graph Construction Module: Using the material semantic fingerprint M as the main index, the entries are topologically sorted by timestamp to construct a directed weighted financial behavior graph G containing the weight of the amount flow. Lifecycle path reconstruction module: Based on the preset lifecycle constraint matrix, the directed weighted financial behavior graph G is folded and mapped to generate a unique lifecycle main path P, and abnormal sub-paths that do not meet the constraint matrix are identified. Results output module: Based on the abnormality of the monetary closure and time interval of the abnormal sub-path, calculate the life cycle integrity index and output the material life cycle assessment results including the abnormality type and risk level.
2. The material lifecycle management system based on underlying financial data as described in claim 1, characterized in that: The semantic fingerprint generation module includes the following steps: The material name field in the original journal entry dataset F is segmented and word vectors are mapped to extract the core semantic feature vectors of the material. Synonyms and words with different specifications are semantically merged to generate standardized material semantic units. The standardized material semantic unit is matched with the supplier code, project number and cost center code in multiple fields to construct a multi-dimensional feature combination matrix, and the association weight coefficient between each field is calculated. Based on the multidimensional feature combination matrix and the associated weight coefficients, a feature weighted encoding operation is performed to generate a unique material semantic feature sequence; The semantic feature sequence of the material is hash-mapped to generate a semantic fingerprint M.
3. The material lifecycle management system based on underlying financial data as described in claim 2, characterized in that: The semantic feature sequence of the material is hash-mapped to generate a semantic fingerprint M, including the following steps: The semantic feature sequence of the materials is concatenated according to a preset field order to form a standardized feature string, and the field boundaries are fixed with separators. The standardized feature string is length-checked and abnormal characters are filtered to generate a standardized encoded input string. The encoded input string is then marked with a version number to distinguish the feature structures generated by different rules. The normalized encoded input string is input into a preset hash mapping algorithm for irreversible mapping operation to generate a fixed-length encoded result, and the duplicate encoding is subjected to secondary perturbation processing by a collision detection method. The fixed-length encoding result after conflict verification is determined as the material semantic fingerprint M.
4. The material lifecycle management system based on underlying financial data as described in claim 1, characterized in that: The financial behavior graph construction module includes the following steps: Using the semantic fingerprint M of materials as the filtering condition, the corresponding set of entries is extracted from the original entry dataset and sorted from earliest to latest according to the timestamp to form a time-ordered entry sequence. At the same time, the entries under the same timestamp are sorted in a secondary order according to the entry number. Based on the time-ordered journal entry sequence, a pairing relationship for amount transmission is constructed according to the direction of lending and borrowing. Debit entries are regarded as nodes of amount inflow and credit entries are regarded as nodes of amount outflow. Initial directed connection edges are established according to the matching principle of equal amount or minimum difference. For each directed connection edge, calculate the amount flow weight value, which is determined by the proportion of the amount of the connection edge to the total amount of the current node; By treating the nodes where funds flow in and out as graph nodes, and the normalized directed edges and their weights as graph edge attributes, a directed weighted financial behavior graph G containing the weights of the flow of funds is constructed.
5. A materials lifecycle management system based on underlying financial data as described in claim 4, characterized in that: The directed weighted financial behavior graph G has nodes representing implicit points of business status and edges representing monetary transmission relationships.
6. A materials lifecycle management system based on underlying financial data as described in claim 1, characterized in that: The lifecycle path reconstruction module includes the following steps: Construct a lifecycle constraint matrix, which uses business status type as the row and column dimensions. The matrix elements are used to represent whether a monetary transmission relationship is allowed between two business statuses and the threshold of the allowed transmission time interval. Each business status type is predefined according to the account code and the debit / credit direction. Each node in the directed weighted financial behavior graph is mapped to the corresponding business status type according to the account code and the debit / credit direction to form a status mapping sequence, and the status mapping sequence is substituted into the life cycle constraint matrix for legality verification. For paths that satisfy the lifecycle constraint matrix rules and have continuously closed financial flow weight values, path folding is performed to merge consecutive legal state nodes into a single lifecycle stage node, generating a candidate lifecycle path set. The path with the largest cumulative weight and the fewest state transitions from the candidate lifecycle path set is selected as the unique lifecycle main path, and the remaining paths that do not meet the lifecycle constraint matrix or are not folded and absorbed are identified as abnormal sub-paths.
7. A materials lifecycle management system based on underlying financial data as described in claim 1, characterized in that: The result output module includes the following steps: The quantitative calculation of the amount closure of abnormal sub-paths is carried out by taking the sum of the amount flow weight values of all directed connection edges in the abnormal sub-path as the cumulative weight value of the path, and then performing a ratio calculation between the cumulative weight value of the path and the cumulative weight value of the corresponding life cycle main path to obtain the amount deviation coefficient. Calculate the time interval anomaly value corresponding to each illegal state transition in the abnormal sub-path. Specifically, subtract the corresponding time interval threshold in the life cycle constraint matrix from the actual time interval, and sum all the excess difference to form the total time excess value. After normalizing the amount deviation coefficient and the total value of time exceeding the limit, a life cycle integrity index is generated. Risk levels are determined based on the range of the life cycle integrity index. When the life cycle integrity index is less than 0.6, it is determined to be a high-risk level; between 0.6 and 0.85, it is determined to be a medium-risk level; and greater than 0.85, it is determined to be a low-risk level. The life cycle assessment results of materials, including the anomaly type and risk level, are output.
8. A materials lifecycle management system based on underlying financial data as described in claim 7, characterized in that: After normalizing the amount deviation coefficient and the total time overdue value, a lifecycle integrity index is generated, including: The normalized amount deviation coefficient and the total time limit excess value are converted into a comprehensive feature vector. The comprehensive feature vector is used as the input of the machine learning model. The machine learning model uses the prediction of the life cycle integrity index label for each set of comprehensive feature vectors as the prediction objective and minimizes the sum of prediction errors for all life cycle integrity index labels as the training objective. The machine learning model is trained until the sum of prediction errors converges and the model training stops. The life cycle integrity index is determined based on the model output. The machine learning model is a multinomial regression model.