An AI-based intelligent auditing and compliance risk prevention and control method for college finance

By constructing an AI-based intelligent audit framework, combining Word2Vec, LSTM, and random forest models, the problems of low accuracy in compliance identification, high risk during time-sensitive periods, and difficulty in tracing logical conflicts in university financial audits have been solved, achieving high efficiency, traceability, and compliance in university financial management.

CN122199183APending Publication Date: 2026-06-12田钊玮

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
田钊玮
Filing Date
2026-03-26
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

The financial audit of universities has problems such as low accuracy in identifying content compliance, lack of dynamic weighting of project types and a mechanism to amplify the risk of project completion, inability to quantify logical conflicts in sub-tasks, and untraceable audit logs, making it difficult to meet the compliance requirements of scientific research management.

Method used

We construct an AI-based intelligent audit framework that deeply integrates Word2Vec, LSTM, and random forest models to achieve structured extraction of data sources, keyword matching and vector generation, time sensitivity determination, sub-task logical conflict rule base query, and risk scoring judgment, thus forming a traceable chain of responsibility.

🎯Benefits of technology

It has improved the efficiency of financial auditing and compliance risk control capabilities of universities, enabling them to proactively identify high-risk and ambiguous reimbursements, reduce the risk of violations, ensure the compliance and auditability of every reimbursement, and build a closed-loop financial risk control ecosystem.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199183A_ABST
    Figure CN122199183A_ABST
Patent Text Reader

Abstract

The application discloses a kind of based on AI's university financial intelligent auditing and compliance risk prevention and control method, comprising: data source structured extraction and field standardization processing;Load scientific research project terminology library, generate reimbursement and project keyword vector based on Word2Vec model and calculate matching degree score, learn keyword probability distribution from historical reimbursement text based on LSTM model to calculate ambiguity, determine time sensitivity and calculate revised ambiguity in combination with problem-solving date;Based on random forest model, generate scientific research sub-task logic conflict rule base from historical conflict data, calculate conflict probability;According to project type, determine weight coefficient, combine revised ambiguity, matching degree score, project cycle consistency, time sensitivity and conflict probability to calculate risk score;When risk score is over threshold, mark high risk, otherwise pass application, and record audit log to form responsibility chain;Promote financial governance scientific and compliance upgrade.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of financial auditing and risk control technology in higher education institutions, and in particular to an AI-based intelligent financial auditing and compliance risk control method for higher education institutions. Background Technology

[0002] In the field of university research funding management, existing financial auditing technologies have long relied on manual experience or simple rule-based matching systems, which have systemic defects and are difficult to meet the requirements of the national reform of streamlining administration and delegating power for precise supervision and efficient service of research funding.

[0003] Traditional manual review methods are inefficient and susceptible to subjective factors, leading to inconsistent standards. Some automated systems rely solely on keyword matching, failing to identify frequently used semantically ambiguous expressions in research scenarios, resulting in a high rate of missed high-risk reimbursement reports. Furthermore, existing technologies severely lack handling of project lifecycle characteristics: as the project completion phase approaches, the risk of ambiguous reimbursement is significantly amplified, but the lack of a quantification mechanism creates audit blind spots. Logical conflicts in sub-tasks rely on human experience for judgment, resulting in insufficient accuracy in conflict detection and lagging rule base updates. In addition, data processing is flawed: reimbursement texts lack filtering of invalid words, project sub-task relationship diagrams lack structured validation, and audit logs lack accountability traceability, violating the closed-loop requirements for compliant management of research funding.

[0004] While AI models such as Word2Vec, LSTM, and Random Forest have made progress in the field of natural language processing, current technologies have not yet systematically integrated them into the financial auditing scenario of universities. Traditional applications use these models in isolation: Word2Vec is used for general text similarity calculation but does not combine it with a research terminology database to dynamically generate probability distributions; LSTM models are mostly used for time series prediction but lack dynamic weighting mechanisms designed for the sensitive period of project completion; the rule base of Random Forest relies on manual formulation and cannot automatically learn from historical conflict data. This fragmented application results in the model's value not being released, including the lack of correlation between fuzzy quantification and time sensitivity, making it impossible to amplify and identify high-risk fuzzy reimbursement claims during the project completion stage; and the conflict rule base not being adapted to the logical strength of research sub-tasks, resulting in a high misjudgment rate.

[0005] Therefore, financial auditing in universities has long been trapped in a triple dilemma: semantic ambiguity makes identification difficult, time-sensitive periods pose high risks, and logical conflicts are difficult to trace. There is an urgent need for an intelligent framework that deeply integrates multiple models to organically unify semantic understanding, dynamic risk quantification, and closed-loop auditing. Summary of the Invention

[0006] The purpose of this invention is to provide an AI-based intelligent financial auditing and compliance risk control method for universities.

[0007] The problem this invention aims to solve is that the existing university research reimbursement review process suffers from low accuracy in identifying compliance issues, lack of dynamic weighting of project types and mechanisms to amplify project completion risks, inability to quantify logical conflicts in sub-tasks, and high dispute rates due to untraceable audit logs, making it difficult to meet the compliance requirements of research management.

[0008] An AI-based intelligent financial auditing and compliance risk control method for universities, employing the following technical solution: S1: Structured Data Source Extraction: Extract expense description text and amount from the university's financial system; extract budget details, project type identifier, project stage identifier, completion date, and project subtask relationship diagram from the project management system. Perform field standardization on the data: unify the date format and the monetary unit; filter invalid text to ensure the semantic consistency and calculation basis of the input data; S2: Load the research project terminology database: Load the corresponding research project terminology database according to the project type identifier; Keyword extraction and vector generation: Based on the Word2Vec model, keywords belonging to the terminology database are extracted from the reimbursement description text, and reimbursement keyword vectors are generated; based on the Word2Vec model, keywords belonging to the terminology database are extracted from the project application budget details text, and project keyword vectors are generated. Calculate the matching score: Calculate the cosine similarity between the reimbursement keyword vector and the project keyword vector to obtain the matching score; Ambiguity calculation: Ambiguity is calculated based on the probability distribution of keywords in the terminology database, and the probability distribution data is generated from historical expense reimbursement texts based on an LSTM neural network model; Determine time sensitivity: Determine time sensitivity based on project phase identifiers and completion dates; Calculate the corrected ambiguity: Calculate the corrected ambiguity to quantify the amplification effect of ambiguity on the final stage of the project; S3: Extract and associate subtask identifiers: Extract the current subtask identifier from the subtask field of the expense report and obtain the list of associated subtasks from the project management system; Query the logical conflict rule base of scientific research subtasks: Based on the preset logical conflict rule base of scientific research subtasks, which is generated from historical project conflict data based on the random forest model, check the logical relationship between the current subtask and related subtasks. Calculate the probability of conflict: The probability of conflict is calculated based on the matching degree of the conflict rule base and the logical consistency weight. The logical consistency weight is preset according to the logical correlation strength of the sub-tasks to quantify the risk of logical conflict in the sub-tasks. S4: Determine the weighting coefficient based on the project type identifier; determine the consistency of the project cycle based on the relationship between reimbursement time and project cycle; calculate the risk score: calculate the risk score based on corrected ambiguity, matching score, project cycle consistency, time sensitivity, and conflict probability; S5: Risk Score Determination: If the risk score is greater than the preset threshold, it is marked as high risk; otherwise, the reimbursement application is approved. Structured audit logs and a chain of responsibility are established: Risk scoring results, parameters including correction of ambiguity, probability of conflict, time sensitivity, and decision-making basis are recorded in the audit logs to form a traceable chain of responsibility, ensuring that the audit process complies with the compliance requirements of university scientific research management.

[0009] Furthermore, in S1, the data source is extracted in a structured manner, and the data undergoes field standardization processing to filter invalid text, including: The extracted expense description text and amount are converted from common fields to a standardized format through field mapping rules: the expense description text retains the semantic tags of university scientific research reimbursement, and the amount field is uniformly converted to RMB yuan; The project subtask relationship diagram is in a structured data format, including [subtask A → associated subtask B], which is used to describe the logical dependencies between subtasks; Invalid text is filtered based on the feature database of university research reimbursement. The feature database includes an invalid word database and a valid word database. The invalid word database includes general words that are unrelated to the research content, and the valid word database includes words specific to the research scenario. When the reimbursement description text contains words from the invalid word database, it is directly marked as high risk and no further calculation is required; otherwise, the text is retained. The calculation of the project completion date includes obtaining the completion date from the project management system, calculating the number of days remaining until completion, and marking the period as a sensitive period for project completion when the number of days remaining is less than a preset threshold. All date fields are uniformly converted to YYYY-MM-DD format, i.e., year-month-day format. The project cycle verification checks whether the reimbursement date is within the project cycle, which is provided by the project management system. The subtask association diagram verification verifies whether the subtask identifier of the reimbursement document exists in the project subtask association diagram. When the verification fails, it is marked as high risk to prevent invalid data from entering the subsequent process.

[0010] The S2 process includes loading a research project terminology database, extracting keywords and generating vectors, calculating ambiguity, determining time sensitivity, and calculating corrected ambiguity, including: According to the project type identifier, the corresponding terminology database is loaded from the scientific research terminology database file stored locally in the university. The terminology database file is a structured text file, and keywords and their probability distribution data in the university scientific research scenario are generated through the Word2Vec model. The generated keyword vector is obtained by converting the keyword list into a word frequency vector, with the dimension equal to the total number of keywords in the terminology database and the vector elements being the embedding vectors of the keywords in the Word2Vec model. The ambiguity is calculated based on the probability distribution of keywords in the terminology database to quantify the semantic ambiguity of the reimbursement description. Ambiguity = ,in The semantic probabilities generated by the LSTM model. This represents the total number of keywords in the reimbursement description text that belong to the terminology database. The training data for the LSTM neural network model consists of five years of historical expense reimbursement texts from universities, including expense descriptions and amounts. The model input is the expense description text for each expense reimbursement, and the model output is the keyword probability distribution. The time sensitivity is determined based on the completion date calculation result output in S1. If the project is in the completion stage and the remaining days are less than the preset threshold, the time sensitivity is 1. Otherwise, the time sensitivity is 0.5. Corrected ambiguity = Ambiguity × Time sensitivity. The impact of ambiguity remains unchanged during the project completion phase, but is halved during the non-completion phase.

[0011] Furthermore, in S3, the associated subtask identifier is extracted, the logical conflict rule base for scientific research subtasks is queried, and the conflict probability is calculated, including: The subtask identifiers of expense reports are matched with the subtask list in the association diagram using field mapping rules; The logical correlation is queried from the pre-set logical conflict rule library of scientific research sub-tasks stored locally in the university. The rule library is a CSV file, generated from historical project conflict data based on the random forest model, and contains only scientific research scenario rules. If the rule base contains [current subtask, associated subtask, conflict], the rule base matching degree is 1; if it contains [consistent], the rule base matching degree is 0; if there is no matching record, the rule base matching degree is 1. The weight values ​​are preset based on the logical correlation strength of the subtasks. When the correlation is strong, i.e., the direct experimental process, the logical consistency weight is 0.8. When the correlation is weak, i.e., the indirect supporting task, the logical consistency weight is 0.3. Conflict probability = rule base matching degree × logical consistency weight.

[0012] Furthermore, in step S4, a weighting coefficient is determined based on the project type identifier, project cycle consistency is determined based on the relationship between reimbursement time and project cycle, and a risk score is calculated, including: The weight coefficient k is loaded according to the project type identifier. The weight coefficient corresponding to the national level project is 0.3, the weight coefficient corresponding to the provincial and ministerial level project is 0.2, and the weight coefficient corresponding to the school level project is 0.1. The project cycle consistency is calculated based on the project cycle information output by S1 and the reimbursement time. If the reimbursement time is within the project cycle, the project cycle consistency is 1; if the reimbursement time is not within the project cycle, the project cycle consistency is 0. Risk score = [0.7 + k × (1 - corrected ambiguity)] × (1 - matching score) + 0.3 × (1 - project cycle consistency) × time sensitivity + 0.5 × conflict probability.

[0013] Furthermore, the risk scoring and determination in S5, the structured recording of audit logs, and the formation of a chain of responsibility include: The risk scoring preset threshold is set based on data from a pilot program for university research reimbursement, and the decision-making basis is recorded in a quantitative description. Regardless of whether it is high-risk or automatically approved, the above fields must be recorded in full. If it is high-risk, the person who performed the manual review and the time must also be recorded. Based on the identity mapping of the project management system, log records are bound to the reimbursement applicant, project leader, and reviewer, generating a unique responsibility chain ID for audit traceability; logs are automatically synchronized to the university's scientific research audit platform.

[0014] The beneficial effects of this invention are as follows: By integrating Word2Vec, LSTM and random forest models to build an end-to-end AI-driven intelligent audit framework, the financial reimbursement audit of universities is upgraded from the traditional manual experience mode to a dynamic, quantitative and traceable intelligent system, which achieves a leap in audit efficiency and compliance risk prevention and control capabilities.

[0015] The Word2Vec model transforms expense reimbursement text and project budget into high-dimensional semantic vectors, breaking through the semantic limitations of traditional keyword matching. This allows the matching score to truly reflect the semantic consistency between the expense description and the project budget, avoiding misjudgments caused by differences in expression. The LSTM model learns the keyword probability distribution based on historical expense reimbursement data, quantifies semantic ambiguity, and associates it with the sensitive period of project completion. It can proactively identify high-risk ambiguous expense reimbursements and reduce compliance loopholes caused by semantic ambiguity. The Random Forest model automatically mines sub-task logic rules from historical project conflict data and builds an updatable conflict rule base, avoiding the lag and subjective bias of manual rule formulation.

[0016] The risk scoring formula balances the weights of key risk factors. The introduction of ambiguity correction quantifies the amplification effect of vague descriptions during the project completion phase, prioritizing the identification of high-risk scenarios. The weight coefficient k adjusts the review rigor based on the project level, ensuring higher weights for high-risk scenarios such as national-level projects, avoiding blind spots caused by a one-size-fits-all approach. Furthermore, the formula achieves weighted calculation of risk factors by weighting and integrating matching scores, project cycle consistency, and conflict probability, ensuring that the risk score reflects both the compliance of reimbursement content and aligns with the phased characteristics of project management.

[0017] Overall, this method not only improves review efficiency but also fundamentally solves long-standing pain points in the review of university research funding, such as the difficulty in identifying vague descriptions, the high risk of time-sensitive periods, and the difficulty in tracing logical conflicts in sub-tasks. It ensures that the compliance of each reimbursement is quantifiable and auditable, builds a closed-loop financial risk control ecosystem for universities, reduces the incidence of violations, and achieves full-process transparent management through structured responsibility chain logs. Attached Figure Description

[0018] Figure 1 This is a flowchart of an AI-based intelligent financial audit and compliance risk control method for universities. Detailed Implementation

[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0020] In order to achieve the above objectives, Figure 1 A flowchart of an AI-based intelligent financial audit and compliance risk prevention method for universities is provided.

[0021] Furthermore, the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article, unless otherwise specified, generally indicates that the preceding and following related objects have an "or" relationship.

[0022] Example 1 An AI-based intelligent financial auditing and compliance risk control method for universities, employing the following technical solution: S1: Structured Data Source Extraction: Extract expense description text and amount from the university's financial system; extract budget details, project type identifier, project stage identifier, completion date, and project subtask relationship diagram from the project management system. Perform field standardization on the data: unify the date format and the monetary unit; filter invalid text to ensure the semantic consistency and calculation basis of the input data; S2: Load the research project terminology database: Load the corresponding research project terminology database according to the project type identifier; Keyword extraction and vector generation: Based on the Word2Vec model, keywords belonging to the terminology database are extracted from the reimbursement description text, and reimbursement keyword vectors are generated; based on the Word2Vec model, keywords belonging to the terminology database are extracted from the project application budget details text, and project keyword vectors are generated. Calculate the matching score: Calculate the cosine similarity between the reimbursement keyword vector and the project keyword vector to obtain the matching score; Ambiguity calculation: Ambiguity is calculated based on the probability distribution of keywords in the terminology database; the probability distribution is learned from historical expense reimbursement texts through an LSTM neural network model; Determine time sensitivity: Determine time sensitivity based on project phase identifiers and completion dates; Calculate the corrected ambiguity: Calculate the corrected ambiguity to quantify the amplification effect of ambiguity on the final stage of the project; S3: Extract and associate subtask identifiers: Extract the current subtask identifier from the subtask field of the expense report and obtain the list of associated subtasks from the project management system; Query the logical conflict rule base for scientific research subtasks: Based on the preset logical conflict rule base for scientific research subtasks, which is generated from historical project conflict data based on the random forest model, check the logical relationship between the current subtask and related subtasks. Calculate the probability of conflict: The probability of conflict is calculated based on the matching degree of the conflict rule base and the logical consistency weight. The logical consistency weight is preset according to the logical correlation strength of the sub-tasks to quantify the risk of logical conflict in the sub-tasks. S4: Determine the weighting coefficient based on the project type identifier; determine the consistency of the project cycle based on the relationship between reimbursement time and project cycle; calculate the risk score: calculate the risk score based on corrected ambiguity, matching score, project cycle consistency, time sensitivity, and conflict probability; S5: Risk Score Determination: If the risk score is greater than the preset threshold, it is marked as high risk; otherwise, the reimbursement application is approved. Structured audit logs and a chain of responsibility are established: Risk scoring results, parameters including correction of ambiguity, probability of conflict, time sensitivity, and decision-making basis are recorded in the audit logs to form a traceable chain of responsibility, ensuring that the audit process complies with the compliance requirements of university scientific research management.

[0023] Furthermore, in S1, the data source is extracted in a structured manner, and the data undergoes field standardization processing to filter invalid text, including: The extracted expense description text and amount are converted into a standardized format using field mapping rules: the expense description text retains semantic tags for university research reimbursement, including experimental material costs and equipment maintenance costs, to distinguish between research activities and non-research activities; Convert all amount fields to RMB, including converting USD amounts to RMB using the exchange rate, to ensure that all amount data is compared and calculated in the same unit.

[0024] Invalid text is filtered based on the feature database of university research reimbursement, which includes an invalid word database and a valid word database. The invalid terminology includes general terms unrelated to research content, such as office supplies, meals, and transportation expenses. These terms usually lack specific relevance to research activities in research reimbursement. If the reimbursement description includes office supplies from the invalid terminology, it is directly marked as high risk and no further calculation is required. The effective vocabulary includes terms specific to scientific research scenarios, such as experimental materials, equipment maintenance, data acquisition, and sample preparation. The reimbursement description includes experimental materials purchased from the effective vocabulary, and the text is retained for subsequent calculations.

[0025] The calculation of the project completion date includes obtaining the completion date from the project management system, in a format such as 2023-12-31, and calculating the number of days remaining until the project completion date, which is the difference in days between the current date and the completion date. When the remaining days are less than the preset threshold, it is marked as the sensitive period for project completion. The preset threshold is usually set at 30 days. The 30 days before the project completion is the preparation period for the completion of university research projects. Reimbursement behavior may affect the completion progress or compliance. The 30-day threshold is based on the experience of university research management. Convert date fields to YYYY-MM-DD format (year-month-day format), such as converting 2023 / 12 / 31 to 2023-12-31.

[0026] Project cycle verification checks whether the reimbursement date falls within the project cycle. The project cycle is provided by the project management system. If the reimbursement date falls within the project cycle, the project cycle consistency is 1; otherwise, it is 0. The subtask association diagram verification checks whether the subtask identifier of the expense report exists in the project subtask association diagram. When the verification fails, it is marked as high risk to prevent invalid data from entering the subsequent process. The project subtask association diagram is in a structured data format, including [Subtask A → Associated Subtask B], which is used to describe the logical dependency relationship between subtasks; When the reimbursement document subtask identifier, such as data collection, exists in the association diagram, [Experiment Preparation → Data Collection], the verification passes; otherwise, it is marked as high risk.

[0027] In the S2, loading the scientific research project term library, keyword extraction and vector generation, calculating the fuzziness, determining the time sensitivity, and calculating the corrected fuzziness include: Loading the corresponding term library from the scientific research term library file stored locally in the university according to the project type identifier. The term library file is a structured text file such as CSV, and each line contains the keyword and its probability distribution data in the university scientific research scenario, such as Experimental Materials, 0.72, Equipment Maintenance, 0.65.

[0028] The training data of the LSTM neural network model is the historical reimbursement text of the university for 5 years, including expense descriptions and amounts. The data preprocessing includes using jieba segmentation (a special dictionary for university scientific research texts), processing effective scientific research vocabulary, and annotating the project type for each reimbursement, such as Sample Preparation → National-level Project; Original text: Sample Preparation Experimental Material Procurement → After segmentation: [Sample, Preparation, Experiment, Material, Procurement] → Keyword extraction: [Experimental Materials] (based on matching with the effective word library); Original text: Equipment Routine Maintenance Expenses → After segmentation: [Equipment, Routine, Maintenance, Expenses] → Keyword extraction: [Equipment Maintenance] (based on matching with the effective word library).

[0029] The model input is the expense description text of each reimbursement, such as Procurement of Experimental Materials, Equipment Maintenance. The model output is the keyword probability distribution; the input dimension is the word vector dimension (100), the number of LSTM units is 128, and the number of training rounds is 50; The reimbursement text has context dependence. For example, the probability of Experimental Materials is high in the Sample Preparation project, and the probability of Equipment Maintenance is high in the Instrument Maintenance project. LSTM can effectively capture this semantic association and is more accurate than simply counting frequencies; Text Sample Preparation Experimental Material Procurement → Probability of Experimental Materials = 0.85, high; Text Equipment Routine Maintenance Expenses → Probability of Equipment Maintenance = 0.78, high.

[0030] Generating keyword vectors by using a pre-trained Word2Vec model from the university scientific research corpus. Generating keyword vectors is to convert the keyword list into a word frequency vector; The corpus source is the historical data of the university scientific research system, including 5-year reimbursement texts, project application forms, and abstracts of scientific research papers; the corpus preprocessing is to remove stop words such as "of", "and", retain scientific research keywords such as Experimental Materials, Data Collection, and merge synonyms such as Equipment Maintenance and Equipment Repair; Dimension = Total number of keywords in the terminology database. If the terminology database contains 100 keywords, then the vector dimension = 100. The vector elements are the embedding vectors of the keywords in the Word2Vec model. The window size is 5, the minimum word frequency is 5, and the model type is Skip-gram.

[0031] Load the pre-trained Word2Vec model, describe the reimbursement as purchasing experimental materials and repairing equipment, extract the keyword list [experimental materials, equipment repair], and obtain the vector: word2vec_model[experimental materials] → [0.35, -0.21, 0.78, ...], generate a 100-dimensional vector, and the dimension = the total number of keywords in the terminology database: [0.35, -0.21, 0.78, ..., 0.12, 0.45].

[0032] The ambiguity is calculated based on the probability distribution of keywords in the terminology database to quantify the semantic ambiguity of the reimbursement description. Ambiguity = ; in The semantic probability is generated by the LSTM model. The probability value reflects the typicality of the keyword. The higher the probability, the clearer the semantics. To determine the total number of keywords in the reimbursement description text that belong to the terminology database, only valid research terms are counted, and invalid terms are excluded; Ambiguity = 0 means that the probability of each keyword in the reimbursement description is 1, and the semantics are absolutely clear; ambiguity = 1 means that the probability of each keyword is equal, and the semantics are completely ambiguous; the reimbursement description is for purchasing office supplies, invalid words have been filtered, there are no keywords in the terminology database → n = 0, ambiguity = 0, but this type of situation has been marked as high risk in S1 and will not enter S2.

[0033] The time sensitivity is determined based on the completion date calculation result output in S1; when the project is in the completion stage and the remaining days are less than the preset threshold, the time sensitivity is 1; for other cases such as in the research or preparation period, the time sensitivity is 0.5; the preset threshold is 30 days, which is the general threshold for the completion sensitivity period in university scientific research management. Correction of ambiguity = ambiguity × time sensitivity. During the project completion stage, the risk of reimbursement is high, but the impact of ambiguity remains unchanged; during the non-completion stage, the risk is lower, and the impact of ambiguity is halved.

[0034] Furthermore, in S3, the associated subtask identifier is extracted, the logical conflict rule base for scientific research subtasks is queried, and the conflict probability is calculated, including: The subtask identifiers of expense reports are matched with the subtask list in the association diagram using field mapping rules; The logical correlation is queried from the pre-set logical conflict rule library of scientific research sub-tasks stored locally in the university. The rule library is a CSV file, generated from historical project conflict data based on the random forest model, and contains only scientific research scenario rules. The training data for the random forest model consists of five years of historical project conflict data from universities, including sub-task relationships and conflict markers, such as experiment preparation → data collection, conflict; records with missing sub-task markers are removed, sub-task names are standardized, such as experiment preparation → experiment preparation, and conflict markers are manually added. The model input includes features such as subtask type, project stage, and task correlation strength, such as experiment preparation (type = experiment) and data collection (type = experiment). Random forest model configuration and training: The number of trees is 100, the maximum tree depth is 5, the minimum number of samples for split nodes is 5, and the class weights are {conflict: 1.5, consistency: 1.0}.

[0035] If the rule base contains [current subtask, associated subtask, conflict], then the rule base matching degree = 1, indicating a clear conflict and high risk; if it contains [consistent], then the rule base matching degree = 0, indicating no conflict and no risk; if there is no matching record, then the rule base matching degree = 1, indicating that it cannot be determined and is considered high risk.

[0036] The weight values ​​are preset based on the logical correlation strength of the subtasks. When the correlation is strong, i.e., a direct experimental process, such as experimental preparation → data collection, the direct experimental process has a high risk, and the logical consistency weight is 0.8. When the correlation is weak, i.e., an indirect support task, such as equipment maintenance → data analysis, the indirect support task has a lower risk, and the logical consistency weight is 0.3.

[0037] Conflict probability = rule base matching degree × logical consistency weight; Scenario 1: Data collection for expense reimbursement document subtask, experimental preparation for related subtask, rule library contains [data collection, experimental preparation, conflict] → matching degree = 1, strong association weight = 0.8 → conflict probability = 1 × 0.8 = 0.8; Scenario 2: The expense report subtask is equipment maintenance, which is related to the subtask data analysis. The rule library contains [equipment maintenance, data analysis, consistency] → matching degree = 0, weak association weight = 0.3 → conflict probability = 0 × 0.3 = 0; Scenario 3: Sample preparation for expense reimbursement document subtask, data collection for related subtask, no matching record in the rule base → matching degree = 1, strong association weight = 0.8 → conflict probability = 1 × 0.8 = 0.8 (high risk, triggering manual review).

[0038] Furthermore, in step S4, a weighting coefficient is determined based on the project type identifier, project cycle consistency is determined based on the relationship between reimbursement time and project cycle, and a risk score is calculated, including: The weighting coefficient k is applied based on the project type identifier: 0.3 for national-level projects; 0.2 for provincial / ministerial-level projects; and 0.1 for university-level projects. Based on the research management practices of universities, violations in national-level projects may lead to financial audit issues and require close monitoring; university-level projects have relatively lower risks and their weight is reduced accordingly; the setting of the weight coefficient k conforms to the compliance management principle that the higher the project level, the greater the risk impact.

[0039] Project cycle consistency is calculated based on the project cycle information output by S1 and the reimbursement time. If the reimbursement time is within the project cycle, the project cycle consistency is 1, indicating that the reimbursement time matches the effective project cycle; if the reimbursement time is not within the project cycle, the project cycle consistency is 0.

[0040] Risk score = [0.7 + k × (1 - corrected ambiguity)] × (1 - matching score) + 0.3 × (1 - project cycle consistency) × time sensitivity + 0.5 × conflict probability; 0.7 is the basic weight, and the basic risk assessment must be retained for the review of scientific research reimbursement in universities; 0.3 is a fixed weight, indicating the importance of sub-task conflict risk; For national-level projects, the compliance reimbursement criteria are: k = 0.3, ambiguity correction = 0.2, matching score = 0.9, project periodicity = 1, time sensitivity = 0.5, conflict probability = 0, and risk score = 0.094. For provincial and ministerial level projects, the reimbursement during the sensitive period of project completion is as follows: k is 0.2, the corrected ambiguity is 0.5, the matching score is 0.6, the project periodicity is 0, the time sensitivity is 1, the conflict probability is 0.8, and the risk score is 1.02. For school-level projects, the reimbursement criteria within the cycle are as follows: k = 0.1, ambiguity correction = 0.2, matching score = 0.9, project cycle = 1, time sensitivity = 0.5, conflict probability = 0.3, and risk score = 0.078.

[0041] Furthermore, the risk scoring and determination in S5, the structured recording of audit logs, and the formation of a chain of responsibility include: The risk scoring preset threshold is set based on data from university research reimbursement pilot programs. It is determined by universities based on historical data and customized based on local pilot data. The decision-making basis is recorded in quantitative terms. Once the threshold is set, the following judgment is automatically executed: if the risk score > the threshold, mark it as high risk; otherwise, pass.

[0042] Regardless of whether it is high-risk or automatically approved, the above fields must be fully recorded, including risk score, correction of ambiguity, time sensitivity, and decision basis; when it is high-risk, the manual reviewer and time should also be recorded.

[0043] Based on the identity mapping of the project management system, the log records are bound to the reimbursement applicant, project leader, and reviewer by associating the project management system with the university's identity identifier; a unique responsibility chain ID is generated, which is generated by using UUID + timestamp to generate a unique ID for audit traceability; the logs are automatically synchronized to the university's scientific research audit platform.

[0044] Example 2 This embodiment further explains the threshold differences for different types of universities (such as 985 / 211 universities and ordinary universities). The preset threshold for risk scoring is set based on pilot data of university research reimbursement.

[0045] There is an inverse correlation between the type of university and the risk scoring threshold: For 985 / 211 universities, their research project funding is large and the supervision is strict, so their risk tolerance is low and the risk scoring threshold is 0.55; for ordinary universities, their research project funding is relatively small and their risk tolerance is slightly higher and the risk scoring threshold is 0.4.

[0046] Scenario 1 involves a 985 university, a national-level project, and compliant reimbursement: Project type is national-level, k=0.3; corrected ambiguity is 0.2, reimbursement description is semantically clear; matching score is 0.9, reimbursement content highly matches project budget; project cycle consistency is 1, reimbursement time is within project cycle; time sensitivity is 0.5, not in the project completion period; conflict probability is 0.0, sub-task logic is not conflicted; risk score calculation value is 0.094 < 0.55 (threshold for 985 universities) → reimbursement application approved.

[0047] Scenario 2 involves a regular university, a provincial / ministerial level project, and reimbursement during the sensitive period of project completion: Project type is provincial / ministerial level, k=0.2; corrected ambiguity is 0.5, the reimbursement description is semantically vague; matching score is 0.6, the reimbursement content matches the project budget moderately; project cycle consistency is 0, the reimbursement time is not within the project cycle; time sensitivity is 1, the project completion period; conflict probability is 0.8, subtask logic conflict; risk score calculation value is 1.02>0.45 (threshold for regular universities) → marked as high risk, triggering manual review.

[0048] The audit traceability process is as follows: Query the responsibility chain ID: Enter RC-20251115-7a8b9c in the university research audit platform; View the decision basis: Confirm the risk score calculation process; View the manual review: Confirm the reviewer and time 2025-11-15 14:30; Verify the threshold setting: Confirm the university type = the risk score threshold corresponding to ordinary universities = 0.45.

[0049] The manual review process includes: Risk score > threshold (1.02 > 0.45) → automatically pushed to the reviewer; the review interface displays the complete decision basis, including the university type = ordinary university and the threshold = 0.45; the review operation is to confirm the risk score calculation process, check for sub-task conflicts, and determine whether it passes; a new responsibility chain ID is generated, and the review result, reason, operator and time are recorded and associated with the original responsibility chain ID.

[0050] All formulas in this invention are dimensionless and calculated by taking their numerical values. Dimensionlessness can be achieved through various methods such as standardization, which will not be elaborated here. The formulas are derived from software simulations using a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas can be set by those skilled in the art according to the actual situation.

[0051] In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. An AI-based intelligent financial auditing and compliance risk control method for universities, characterized in that, include: S1: Extract the expense description text and amount from the university's financial system; extract the budget details, project type identifier, project stage identifier, completion date, and project sub-task relationship diagram from the project management system; perform field standardization processing on the extracted data, including standardizing the date format, standardizing the amount unit, and filtering invalid text; S2: Load the corresponding scientific research project terminology database according to the project type identifier; extract keywords belonging to the terminology database from the reimbursement description text and the project application budget details text based on the Word2Vec model, and generate keyword vectors; calculate the cosine similarity between the reimbursement keyword vector and the project keyword vector to obtain the matching score; Ambiguity is calculated based on the probability distribution of keywords in the terminology database, and the probability distribution data is generated from historical expense reimbursement texts based on an LSTM neural network model; time sensitivity is determined based on project stage identifiers and completion dates; Calculate the corrected ambiguity; S3: Extract the current subtask identifier from the expense report and retrieve the list of associated subtasks from the project management system; Based on a pre-defined logical conflict rule base for scientific research sub-tasks, which is generated from historical project conflict data using a random forest model, the logical correlation between the current sub-task and related sub-tasks is checked; and the conflict probability is calculated based on the matching degree and logical consistency weight of the conflict rule base. S4: Determine the weighting coefficient based on the project type identifier; Project cycle consistency is determined based on the relationship between reimbursement time and project cycle; Risk scores are calculated based on corrected ambiguity, matching score, project cycle consistency, time sensitivity, and conflict probability. S5: Determine whether a risk is high based on the risk score. If the risk score exceeds the preset threshold, it is marked as high risk; otherwise, the reimbursement application is approved. Record the risk score results, correction of ambiguity, probability of conflict, time sensitivity, and decision basis in the audit log to form a traceable chain of responsibility.

2. The AI-based intelligent financial auditing and compliance risk prevention method for universities as described in claim 1, characterized in that, S1 further includes: The expense description text and amount are converted into a standardized format through field mapping rules. The expense description text retains specific semantic tags for scientific research reimbursement, and the amount field is uniformly set to a preset unit. The project subtask relationship diagram is structured logical dependency data, including [subtask A → associated subtask B]; Invalid text is filtered based on a pre-set feature database of university research reimbursement, which includes a set of invalid words and a set of valid words in the research scenario; The remaining days are calculated and the sensitive period is determined based on the project completion date obtained from the project management system. All date fields are in a standardized format. Verify whether the reimbursement date is within the project cycle, and verify whether the reimbursement document subtask identifier exists in the project subtask association diagram.

3. The AI-based intelligent auditing and compliance risk prevention method for university finances as described in claim 1, characterized in that, S2 further includes: Load the corresponding scientific research project terminology database according to the project type identifier. The terminology database contains keywords and their probability distribution data in the university scientific research scenario. The probability distribution data is generated by learning from historical reimbursement texts based on the LSTM neural network model. The training data for the LSTM neural network model is historical expense reimbursement texts from universities. The model input is the expense description text for each expense reimbursement, and the model output is the keyword probability distribution. Keyword vectors are generated based on the Word2Vec model to describe expense reimbursement text and project application budget details text. The keyword vector is a word frequency vector with a dimension equal to the total number of keywords in the terminology database, and the vector elements are the embedding vectors of the keywords in the Word2Vec model. The ambiguity is calculated based on the probability distribution, and the formula for calculating the ambiguity is as follows: ,in The semantic probabilities generated by the LSTM model. This represents the total number of keywords in the reimbursement description text that belong to the terminology database. Time sensitivity is determined based on project phase identifiers and completion dates; Calculate the corrected ambiguity, which is equal to the product of the ambiguity and the time sensitivity. The impact of ambiguity remains unchanged during the project completion stage, and is halved during the non-completion stage.

4. The AI-based intelligent financial auditing and compliance risk prevention method for universities as described in claim 1, characterized in that, S3 further includes: The subtask identifiers of expense reports are matched with the subtask list in the association diagram using field mapping rules; The logical correlation matching degree is determined based on a pre-set scientific research sub-task logical conflict rule base. The rule base is a CSV format file, generated from historical project conflict data based on a random forest model, and contains only scientific research scenario rules. Preset logical consistency weights based on the logical correlation strength of subtasks; The probability of conflict is calculated by multiplying the matching degree by the logical consistency weight.

5. The AI-based intelligent financial auditing and compliance risk prevention method for universities as described in claim 1, characterized in that, S4 further includes: Load the corresponding weight coefficient k according to the project type identifier, and determine the consistency of the project cycle based on the relative relationship between the reimbursement time and the project cycle; The risk score is calculated based on the corrected ambiguity, matching score, project cycle consistency, time sensitivity, and conflict probability. Risk score = [0.7 + k × (1 - corrected ambiguity)] × (1 - matching score) + 0.3 × (1 - project cycle consistency) × time sensitivity + 0.5 × conflict probability.

6. The AI-based intelligent financial auditing and compliance risk prevention method for universities as described in claim 1, characterized in that, The S5 also includes: The risk scoring preset threshold is set based on data from the university research reimbursement pilot program. The decision-making basis records the risk scoring results and quantitative information on ambiguity correction, conflict probability, time sensitivity, and decision-making basis. In cases of high risk, the manual reviewer and time are additionally recorded. Based on identity mapping, log records are bound to the reimbursement applicant, project leader, and reviewer, generating a unique chain of responsibility ID, and the logs are synchronized to the audit platform.