Artificial intelligence-based investment intelligence risk assessment and decision assistance method
By combining a self-verifying retrieval enhancement framework with a large-scale language model, the deep correlation and compliance issues of multi-source financial data are resolved, enabling efficient and reliable investment risk assessment and decision support, and improving the integration capabilities of financial data and the security of assessment results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- LAB FOR AI-POWERED FINANCIAL TECH LTD
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-12
AI Technical Summary
Existing AI-based investment risk assessment methods lack deep semantic and structural connections when integrating multi-source heterogeneous financial data, resulting in fragmented information, one-sided retrieval results, and an inability to fully capture the accuracy of professional terminology and the complex relationship network between entities. Furthermore, they fail to effectively embed financial compliance requirements, leading to insufficient reliability and security of assessment results.
A self-verifying retrieval enhancement framework is adopted for data source alignment and cleaning, semantic index, keyword index and knowledge graph index are constructed, and a large language model is used for cross-validation of relevance, factual consistency and financial compliance. The final risk assessment result is generated by combining a hierarchical intelligent agent framework and decision support is provided by a financial expert model.
It achieves deep integration and self-verification of multi-dimensional information, improves the depth and breadth of knowledge retrieval, ensures the reliability and security of evaluation results, reduces evaluation bias caused by information distortion or non-compliance, and enhances the reliability of decision-making.
Smart Images

Figure CN122199152A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence financial technology, and in particular to an AI-based intelligent risk assessment and decision support method for investment. Background Technology
[0002] Current AI-based investment risk assessment methods typically employ structured data processing or single-mode text analysis techniques. When integrating heterogeneous financial data from multiple sources such as market data, financial reports, and news, existing technologies often perform simple format conversions or independent modeling, lacking effective fusion of deep semantic and structural relationships between different data sources. This results in a fragmented underlying information foundation, making it difficult to support a comprehensive risk profile of investment targets. In the information retrieval stage, mainstream methods rely on keyword matching or general semantic search, failing to simultaneously capture the precision of professional terminology, the contextual semantics of text, and the complex relationship networks between entities. This leads to one-sided and weakly correlated search results, directly impacting the depth and reliability of subsequent risk analysis.
[0003] Existing solutions, when using large-scale language models to generate or process financial content, mostly focus on the superficial relevance of information or basic fact-checking, without systematically embedding the stringent regulatory and compliance requirements of the financial sector into the verification process. Models may directly use or generate text containing subjective predictions, inappropriate advice, or contradictions with regulatory provisions, introducing new compliance and factual risks into the assessment process itself, and compromising the objectivity and security of the output results. A technical solution is needed that can deeply integrate multi-dimensional financial information from the ground up and perform rigorous factual and compliance filtering before content generation. Summary of the Invention
[0004] The purpose of this invention is to address the shortcomings of existing technologies by proposing an intelligent investment risk assessment and decision support method based on artificial intelligence.
[0005] To achieve the above objectives, the present invention adopts the following technical solution: an investment intelligent risk assessment and decision-making assistance method based on artificial intelligence, comprising: Obtain multi-source heterogeneous financial data of the target investment target, perform data source alignment and cleaning on the multi-source heterogeneous financial data, and obtain standardized time series data; The standardized time-series data is processed into structured text to generate a text-based financial report in a unified format. Based on a self-verification retrieval enhancement framework, a triple index is constructed for the textual financial report, generating a semantic index, a keyword index, and a knowledge graph index. The system receives a user's risk assessment query, parses the query through an intent routing mechanism, triggers a hybrid retrieval based on the semantic index, the keyword index, and the knowledge graph index, and obtains an initial set of knowledge blocks. The initial set of knowledge blocks is cross-validated for relevance, factual consistency, and financial compliance using a large language model, and the knowledge blocks that pass the validation are selected as security knowledge contexts. The security knowledge context and the risk assessment query are input into a pre-trained financial expert model to generate a preliminary risk assessment report; The preliminary risk assessment report is decomposed and logically verified using a hierarchical intelligent agent framework to generate the final risk assessment results and decision support suggestions.
[0006] Preferably, the step of aligning and cleaning the multi-source heterogeneous financial data to obtain standardized time-series data includes: The numerical data and text data in the multi-source heterogeneous financial data are identified, and the missing values in the numerical data are filled in using time series interpolation, and the non-standard terms in the text data are normalized by financial dictionary mapping. The filled and normalized data are aligned according to a unified timestamp benchmark to eliminate time granularity differences between different data sources; Calculate the statistical feature values of the aligned data, identify and remove abnormal data points that deviate from the preset threshold range based on the statistical feature values, and obtain standardized time series data.
[0007] Preferably, the textual financial report is constructed using a self-verifying retrieval enhancement framework, generating a semantic index, a keyword index, and a knowledge graph index, including: The textual financial report is vectorized using a semantic encoding model, and the result of the vectorization is stored as a semantic index. Key financial entities and terms are extracted from the textual financial report, a term inverted list is constructed, and a keyword index is generated. The entity relationships in the textual financial report are analyzed, and a graph structure with entities as nodes and relationships as edges is constructed to generate a knowledge graph index.
[0008] Preferably, the step of using a large language model to perform cross-validation of the initial knowledge block set for relevance, factual consistency, and financial compliance, and selecting the validated knowledge blocks as security knowledge context, includes: The instruction states that the large language model determines the relevance of each initial knowledge block to the topic of the risk assessment query, outputs a relevance score, and discards initial knowledge blocks whose relevance scores are lower than a preset relevance threshold. The instruction describes a large language model that compares the consistency between the initial knowledge block and the corresponding factual data in the standardized time-series data, and marks knowledge blocks that are contradictory. The instruction states that the large language model checks the compliance of the initial knowledge block content based on a preset financial compliance rule base and marks potentially non-compliant content. Only the initial knowledge blocks that have passed the relevance judgment, consistency comparison and compliance verification are retained and combined into a security knowledge context; The steps for constructing the large language model include: A pre-training corpus was constructed by collecting massive amounts of general text data and professional text data from the financial field. Using a neural network model with a Transformer architecture as the base model, self-supervised pre-training is performed on the pre-training corpus to learn general language representations and financial domain knowledge; After pre-training, the base model is supervised fine-tuned using the instruction fine-tuning dataset to enable the model to understand and execute complex instructions related to risk assessment. The fine-tuned model is optimized using reinforcement learning techniques based on human feedback, making the model output more in line with the professional requirements and safety standards of risk assessment tasks, thus obtaining the large-scale language model.
[0009] Preferably, the step of inputting the security knowledge context and the risk assessment query into a pre-trained financial expert model to generate a preliminary risk assessment report includes: The financial expert model is trained using a teacher-student architecture, where the teacher model is fully fine-tuned through an inference-enhanced dataset and thought chain annotations to learn the logic of financial experts. The student model reproduces the reasoning ability of the teacher model through knowledge distillation technology, and undergoes quantification and reasoning optimization processing. Based on the security knowledge context, the financial expert model infers the risk assessment query and generates a preliminary risk assessment report that includes risk dimension analysis, risk level determination, and references.
[0010] Preferably, the step of decomposing and logically verifying the preliminary risk assessment report using a hierarchical intelligent agent framework to generate the final risk assessment result and decision support suggestions includes: The planner agent decomposes the analysis conclusions of the preliminary risk assessment report into multiple atomic tasks: risk validity verification, data consistency review, and decision suggestion generation. The adapter agent translates each atomic task into the call parameters of the corresponding backend data query function or rule verification function; The executor agent retrieves the latest data from a verified deterministic financial data source based on the calling parameters, and performs risk indicator recalculation and logical rule matching. The synthesizer agent integrates the recomputation results and rule matching results of the actuator agent to revise and enrich the preliminary risk assessment report, generating a final risk assessment result with confidence level labels. Based on the final risk assessment results, specific decision support suggestions are generated.
[0011] Preferably, the final output is also validated by a reviewer agent: The examiner agent compares the logical coherence between the final risk assessment result generated by the synthesizer agent and the preliminary risk assessment report. Verify that all data sources cited in the final risk assessment results have been verified by the self-verification retrieval enhancement framework; When a logical break or unverified data reference is detected, the relevant task is returned to the planner agent for reprocessing. The final risk assessment results and decision support recommendations will only be approved if all verification items pass.
[0012] Preferably, this also includes optimizing the continuity of risk assessment based on the intelligent dialogue history management system: The system records the history of multiple rounds of dialogue with the user, including each risk assessment query and the corresponding final risk assessment result; Through real-time intent analysis, recent dialogue history is preserved in its original form, intermediate dialogue history is distilled into a task summary, and data pointers are used to replace the massive data objects in the security knowledge context. In subsequent risk assessments, an optimal context containing recent original text, interim summary, and data pointers is dynamically constructed and input into the financial expert model.
[0013] Preferably, it also includes presenting risk assessment results through a split intelligent rendering architecture: The financial expert model generates a lightweight, dynamically rendered template containing data placeholders, rather than a static report containing complete data. After the client requests the dynamic rendering template, it requests the core data in the final risk assessment result from the data service layer through an asynchronous application programming interface; After receiving the core data, the client completes the chart drawing and report rendering locally, realizing the visualization of the risk assessment results.
[0014] Preferably, the training process of the pre-trained financial expert model includes: We rely on financial experts to perform high-precision labeling on the original financial texts to create a high-quality seed dataset. Synonym replacement, back translation, and rule-based template filling techniques are used to augment the seed dataset and expand its size. Using a data-augmented dataset, combined with a teacher-student architecture and knowledge distillation techniques, the basic large-scale language model is fine-tuned in stages to obtain the financial expert model.
[0015] Compared with the prior art, the advantages and positive effects of the present invention are as follows: A self-verifying retrieval enhancement framework is adopted, simultaneously constructing semantic indexes, keyword indexes, and knowledge graph indexes, and triggering hybrid retrieval through intent routing. This technical solution can perform complementary information capture and alignment from three dimensions: the deep meaning of the text, precise terminology matching, and structured relationships between entities. The self-verifying mechanism of the retrieval process compares and integrates results from different indexes, effectively avoiding the limitations of a single retrieval mode, improving the depth, breadth of coverage, and consistency of internal logic of knowledge retrieval, and providing a more solid and comprehensive information foundation for risk assessment.
[0016] A large-scale language model is used to simultaneously perform triple cross-validation on the initial set of retrieved knowledge blocks, including semantic relevance verification, factual consistency verification among multiple sources, and specific compliance verification for financial texts. The compliance verification automatically identifies and filters text fragments containing illegal promises, inappropriate inducements, or regulatory-sensitive content based on pre-set financial regulatory rules and industry standards. This technical solution constructs a security filter before information flows into the decision-making model, reducing assessment bias and decision-making risks caused by information distortion, contradictions, or non-compliance, and enhancing the reliability and security of the entire system's output. Attached Figure Description
[0017] Figure 1 This is a flowchart of the AI-based intelligent investment risk assessment and decision support method described in this invention; Figure 2 A flowchart for data source alignment and cleaning; Figure 3 A flowchart for the verification and generation of a hierarchical intelligent agent framework; Figure 4 A comparison chart of efficiency indicators before and after the implementation of the intelligent dialogue history management system. Figure 5 A graph showing the relationship between dataset size expansion and the percentage of effective samples for data augmentation techniques. Detailed Implementation
[0018] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0019] In the description of this invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientation or positional relationships, are based on the orientation or positional relationships shown in the accompanying drawings and are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. Furthermore, in the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.
[0020] See Figure 1 The system acquires multi-source heterogeneous financial data on target investment objects and aligns and cleans this data to obtain standardized time-series data. It then performs structured-to-text conversion on the standardized time-series data to generate a unified-format textual financial report. Based on a self-verifying retrieval enhancement framework, the system constructs a triple index for the generated textual financial report: semantic index, keyword index, and knowledge graph index. When a user's risk assessment query is received, the system parses the query through an intent routing mechanism and triggers a hybrid retrieval based on the aforementioned triple index, obtaining an initial set of knowledge blocks. A large-scale language model is used to cross-validate this initial set of knowledge blocks for relevance, factual consistency, and financial compliance, selecting validated knowledge blocks as security knowledge context. The security knowledge context and the risk assessment query are input into a pre-trained financial expert model to generate a preliminary risk assessment report. A hierarchical intelligent agent framework is used to decompose and logically validate the preliminary risk assessment report, generating the final risk assessment result and decision support suggestions.
[0021] In one embodiment of the present invention, see [reference] Figure 2This process processes multi-source heterogeneous financial data to build an index. For a target investment, such as the stock of a listed company, the multi-source heterogeneous financial data includes real-time trading data from stock exchanges, textual data from company financial statements, and data from financial news reports. The data source alignment and cleaning process identifies numerical and textual data within the multi-source heterogeneous financial data. Missing values in numerical data, such as stock price sequences, are filled using time series interpolation. Non-standard terms in textual data, such as "revenue" in financial statements, are normalized to the standard term "operating income" through a financial dictionary mapping. The filled and normalized data is aligned according to a unified timestamp benchmark, for example, aligning all data to the daily closing time to eliminate time granularity differences between different data sources, such as minute-level trading data versus daily-level financial data. Statistical features of the aligned data, such as the mean and standard deviation, are calculated. Based on these statistical features, outlier data points deviating from a preset threshold range are identified and removed. An anomaly detection threshold is defined in a formula. ,in: Data points Standardized scores, , It is the mean of the data sequence. It is the standard deviation of the data series. It is a preset threshold that yields standardized time-series data.
[0022] In some embodiments, a triple index is constructed for textual financial reports based on a self-verifying retrieval enhancement framework. The textual financial reports are generated by transforming standardized time-series data. Semantic encoding models such as BERT are used to vectorize the textual financial reports. The results of the vectorization are stored as a semantic index. Key financial entities and terms such as "net profit" and "price-to-earnings ratio" are extracted from the textual financial reports. An inverted list of terms is constructed to generate a keyword index. Entity relationships in the textual financial reports, such as "Company A belongs to industry B", are parsed. A graph structure with entities as nodes and relationships as edges is constructed to generate a knowledge graph index.
[0023] In practical implementation, data comparison is reflected in the improved consistency of data before and after cleaning. For example, before cleaning, stock price sequences contain missing values and abnormal fluctuations; after cleaning, they form continuous, smooth, and standardized time-series data. After index construction, retrieval efficiency is optimized through hybrid retrieval, such as semantic indexing supporting similarity search, keyword indexing supporting exact matching, and knowledge graph indexing supporting relationship queries. Optionally, time series interpolation uses linear interpolation or spline interpolation, depending on the data characteristics. It is understood that financial dictionary mapping normalization uses a predefined financial domain terminology dictionary to ensure terminology consistency. In some embodiments, the training of the semantic encoding model involves financial domain corpora to improve the accuracy of vectorized representations. Optionally, the construction of the terminology inverted list includes synonym expansion to enhance the coverage of the keyword index. It is understood that the entity relationship parsing of the knowledge graph index is based on rule-based or machine learning models to extract structured information.
[0024] In one embodiment of the present invention, see [reference] Figure 3 The initial knowledge block set is validated and a large language model is constructed. For a user-input risk assessment query, such as "assess company X's policy risks in the new energy field," the initial knowledge block set obtained through mixed retrieval may contain different fragments from news and research reports. The large language model is used to cross-validate the initial knowledge block set for relevance, factual consistency, and financial compliance. The large language model is then instructed to determine the relevance of each initial knowledge block to the topic of the risk assessment query and output a relevance score. A formula for calculating the final relevance decision is: in: A Boolean decision indicating whether to retain the knowledge block. This represents the relevance score output by a large language model. This indicates a preset relevance threshold; initial knowledge blocks with relevance scores below this threshold are discarded. In some embodiments, the large language model is instructed to compare the consistency of the initial knowledge block with the corresponding factual data in the standardized time series data. For example, if the initial knowledge block claims "Company Y's quarterly revenue increased by 15% year-on-year," the large language model needs to verify the financial data sequence of Company Y in the standardized time series data to confirm this fact, marking contradictory knowledge blocks. The large language model is then instructed to verify the compliance of the initial knowledge block content according to a preset financial compliance rule base. This preset financial compliance rule base includes violations such as insider information and false statements, marking potentially non-compliant content. Only the initial knowledge blocks that pass the relevance judgment, consistency comparison, and compliance verification are retained and combined into a security knowledge context.
[0025] In practical implementation, data comparison is reflected in the difference in the quality of knowledge blocks before and after verification. For example, before verification, the initial knowledge block set may contain outdated or weakly related information to the query topic, while the security knowledge context formed after verification is highly relevant to the query, factually accurate, and compliant with regulatory requirements. The construction steps of the large-scale language model include collecting massive amounts of general text data and financial professional text data to form a pre-training corpus. The financial professional text data includes listed company annual reports, securities firm research reports, and financial regulations. A neural network model with a Transformer architecture is used as the base model, and self-supervised pre-training is performed on the pre-training corpus to learn general language representations and financial domain knowledge. After pre-training, the base model is then fine-tuned in a supervised manner using an instruction fine-tuning dataset. This dataset contains examples of financial risk assessment tasks in the form of "instruction-input-output," enabling the model to understand and execute complex instructions related to risk assessment. Optionally, the fine-tuned model is optimized using human feedback reinforcement learning technology. This technology uses risk analysts to rank the model's outputs for reward modeling, making the model outputs more consistent with the professional requirements and security standards of risk assessment tasks, resulting in a large-scale language model. In some embodiments, a pre-defined financial compliance rule base exists in the form of structured rules and keyword lists. The large-scale language model calls the rule base for pattern matching and logical judgment. Optionally, in the factual consistency comparison process, the large-scale language model transforms the claims in the knowledge blocks into structured queries and retrieves numerical evidence from a standardized time-series database for verification. The combination of security knowledge contexts follows the principles of chronological order and logical association to ensure information coherence.
[0026] In one embodiment of the present invention, the application of a financial expert model and the workflow of a hierarchical intelligent agent framework involve inputting a security knowledge context and a risk assessment query into a pre-trained financial expert model to generate a preliminary risk assessment report. The security knowledge context includes verified information about the target company's financial data and industry policies. The risk assessment query is "analyze the target company's debt repayment risk over the next twelve months." The financial expert model is trained using a teacher-student architecture. The teacher model in the teacher-student architecture undergoes full fine-tuning through an inference-enhanced dataset and thought chain annotations. The inference-enhanced dataset contains financial analysis cases with step-by-step inference logic to learn financial expert logic. The student model reproduces the inference ability of the teacher model through knowledge distillation technology. The knowledge distillation technology uses the output probability distribution of the teacher model as a soft label to guide the training of the student model, and performs quantification and inference optimization to improve deployment efficiency. Based on the security knowledge context, the financial expert model infers the risk assessment query and generates a preliminary risk assessment report that includes risk dimension analysis, risk level determination, and basis citations. The preliminary risk assessment report may indicate "moderate liquidity risk, based on an operating cash flow coverage ratio of only 1.2."
[0027] In practical implementation, a hierarchical intelligent agent framework is used to decompose and logically verify the preliminary risk assessment report. The planner intelligent agent decomposes the analysis conclusions of the preliminary risk assessment report into multiple atomic tasks, including risk validity verification, data consistency review, and decision suggestion generation. The adapter intelligent agent translates each atomic task into the call parameters of the corresponding backend data query function or rule verification function. For example, "verify cash flow coverage ratio" is translated into a function call to query the latest quarterly cash flow statement and short-term liabilities. The executor intelligent agent obtains the latest data from verified deterministic financial data sources, including APIs from authoritative financial data providers, based on the call parameters. It then performs risk indicator recalculation and logical rule matching. The synthesizer intelligent agent integrates the recalculation results and rule matching results of the executor intelligent agent to revise and enrich the preliminary risk assessment report. Revisions include updating data or adjusting risk levels. The synthesizer intelligent agent generates the final risk assessment result with confidence level labels. The confidence level of the final risk assessment result is calculated using the following formula: in: Represents the final confidence level. This represents the initial confidence level of the financial expert model's output. The data consistency score represents the verification process of the actuator intelligent agent. These are preset weighting coefficients that generate specific decision support suggestions based on the final risk assessment results, such as "It is recommended to increase the allocation of short-term liquid assets."
[0028] In some embodiments, data comparison is reflected in the differences between the preliminary and final reports. The preliminary risk assessment report relies on the security knowledge context at the time of generation, while the final risk assessment result incorporates information recalculated and verified based on the latest deterministic data sources by the hierarchical agent framework. For example, the cash flow coverage ratio in the preliminary report is 1.2, but the latest data obtained by the executor agent shows that the ratio has been updated to 1.5, and the final report corrects the liquidity risk level from "medium" to "low". It is understood that the application of knowledge distillation technology enables the student model to maintain performance close to that of the teacher model while having faster inference speed and lower resource consumption. In some embodiments, the rule verification function library includes a series of predefined logical verification rules such as financial ratio health checks and regulatory red line violation detection. Optionally, if the synthesizer agent finds a significant conflict between the recalculated result and the preliminary report when integrating information, it will trigger a reassessment process for the relevant risk dimensions. It is understood that confidence level labeling provides a quantitative reference for the reliability of conclusions in subsequent decision-making processes. Optionally, the generation of decision support suggestions is based on preset risk response strategy templates, which are mapped to specific operational suggestions according to different risk levels and types.
[0029] In one embodiment of the present invention, the final output is verified by a reviewer agent. The reviewer agent compares the logical coherence between the final risk assessment result generated by the synthesizer agent and the preliminary risk assessment report. For example, the preliminary risk assessment report infers the risk of an increase in the company's debt ratio based on historical data, while the final risk assessment result incorporates the latest quarterly report showing a stable debt ratio. The reviewer agent verifies the logical coherence to ensure the integrity of the reasoning chain from historical trends to the latest data. The reviewer agent verifies whether all data sources cited in the final risk assessment result have been verified by the self-verification retrieval enhancement framework. Data sources include the company's financial statement number and market data timestamps. When a logical break or unverified data reference is found, such as the final risk assessment result citing an industry policy change information without a source, the reviewer agent returns the relevant task to the planner agent for reprocessing. Only when all verification items pass, the reviewer agent approves the output of the final risk assessment result and decision support suggestions. The data comparison reflects the difference in credibility between the reports before and after the review. For example, the report before the review may contain unverified data points, while all references in the report after the review are accompanied by traceable verification marks.
[0030] In some embodiments, the logical coherence comparison of the reviewer agent employs a hybrid approach based on rules and semantic similarity. Rule checks include causal matching between risk conclusions and data evidence. Semantic similarity calculation uses an embedding model to measure the consistency of key assertions between the preliminary and final reports. Data source verification by the reviewer agent is achieved through querying the verification log of a self-verifying retrieval enhancement framework. The verification log records the status of each data item during index building and cross-validation. It is understood that when a task is returned for reprocessing, the planner agent will re-decompose the task and prioritize calling the latest data source for review. Optionally, the reviewer agent will attach an integrity flag to the final risk assessment result before approving the output. This integrity flag is generated based on the percentage of verifications that passed.
[0031] In practical implementation, the AI-based intelligent investment risk assessment and decision support method also includes optimizing the continuity of risk assessment based on an intelligent dialogue history management system. The system records multi-round dialogue history with the user, including each risk assessment query and its corresponding final risk assessment result. For example, if a user repeatedly inquires about the market risk, credit risk, and operational risk of the same company within a week, the system uses real-time intent analysis to preserve the original text of recent dialogue history (the complete text of the last three rounds of interaction), distills intermediate dialogue history into task summaries (interactions three rounds prior), compresses them into key points using natural language generation technology, and replaces the massive data objects in the security knowledge context with data pointers. These data pointers point to the original standardized time-series data or index entries stored in the database. During subsequent risk assessments, an optimal context containing recent original text, intermediate summaries, and data pointers is dynamically constructed and input into the financial expert model. The construction of the optimal context follows a formula: in: This represents the optimal context content constructed. This represents the text content of the recent original article. The text content representing the interim summary, A collection representing data pointers. , and These are weighting coefficients that are dynamically adjusted based on the dialogue rounds and intent relevance. In some embodiments, real-time intent analysis is based on the semantic relevance between the user's current query and historical queries, with the relevance calculated using the cosine similarity of the query embedding vectors. Optionally, task summary generation uses a text summarization model to extract core decision points and risk conclusions from historical dialogues. It can be understood that the use of data pointers reduces the network overhead and model input length from directly transmitting large data objects, and the dynamically constructed optimal context ensures that the financial expert model maintains cognitive consistency in continuous dialogues. See Table 1.
[0032] Table 1: Dialogue History Management Table Dialogue rounds User query content System output result types Historical storage format Data pointer example 1 Assess the market risk of Company A Final risk assessment results Interim Summary (Distillation) ptr_financial_data_2023Q3 2 Assess the credit risk of Company A Final risk assessment results Interim Summary (Distillation) ptr_credit_report_2023 3 Assess the operational risks of Company A Final risk assessment results Recent original text (retained) ptr_incident_log_2023 4 Comprehensive assessment of the overall risk of Company A To be generated Dynamically constructing the optimal context Pointer sets: ptr_financial_data_2023Q3, ptr_credit_report_2023, ptr_incident_log_2023 In practical implementation, data comparison reflects the difference in context construction efficiency before and after using the intelligent dialogue history management system. For example, without this system, each query requires reloading all relevant historical data and report text, and the input length may exceed the model's limits. After using the system, through summarization and pointerization, the input length is optimized while retaining necessary continuity information. This can be understood as the weighting coefficients... , and The adjustment strategy is based on the recency of the conversation and the relevance of the content. For example, when a user query is highly relevant to recent history, the weight of recent original text is increased. It will increase.
[0033] See Figure 4 In the intelligent dialogue history management phase, a quantitative comparison of four efficiency indicators before and after system use was presented. Purple bars represent indicator values without the dialogue management system, while orange bars represent indicator values with the system. Specifically, in the "context input length (KB)" dimension, the input length was 128KB without the system, which was optimized to 32KB after using the system, demonstrating the compression effect of summarization and pointerization on input redundancy. In the "model loading time (s)" dimension, the loading time was 45s without the system, which was reduced to 8s after using the system, reflecting the improved resource loading efficiency after data pointers replaced large data objects. In the "data transfer overhead (MB)" dimension, the overhead was 28MB without the system, which was optimized to 5MB after using the system, verifying the reduction effect of the pointerization strategy on network transmission load. In the "cognitive consistency score" dimension, the score was 72 without the system, which improved to 96 after using the system, indicating that by dynamically constructing the optimal context, the cognitive coherence of the model in dialogue continuity was enhanced. The changes in these metrics quantify the actual effectiveness of the intelligent dialogue history management system in optimizing context building efficiency, reducing resource consumption, and ensuring cognitive consistency, providing data support for the system's technological value.
[0034] In one embodiment of the present invention, a split intelligent rendering architecture is used to present the risk assessment results. The financial expert model generates a lightweight dynamic rendering template containing data placeholders instead of a static report containing complete data. The dynamic rendering template defines the report's structure, text description framework, and chart positions, but specific values are identified by placeholders. After the client requests the dynamic rendering template, it requests the core data in the final risk assessment results from the data service layer through an asynchronous application programming interface. The core data refers to the key numerical indicators and conclusion labels that have been finally verified. After receiving the core data, the client completes the chart drawing and report rendering locally to achieve a visual display of the risk assessment results. The data comparison reflects the difference in network transmission and processing load. Before using the split intelligent rendering architecture, the system needed to transmit a complete static report document containing all formatted text and embedded data, which was a large amount of data. After using the split intelligent rendering architecture, the network transmission content is separated into a lightweight dynamic rendering template and a structured core data packet, which reduces the overall transmission load and improves the client's rendering flexibility.
[0035] In some embodiments, the dynamic rendering template is defined using a markup language that includes conditional statements and loop logic to support the generation of dynamic content. The core data provided by the data service layer is returned in key-value pairs or structured JSON format, precisely matching the placeholders in the dynamic rendering template. It can be understood that client-side local rendering utilizes the built-in chart libraries of modern web browsers or mobile applications to generate visualization elements such as line charts and bar charts based on the core data. Optionally, the generation logic of the dynamic rendering template follows a formula: in: A complete visual report representing the final output. This represents the client-side local rendering engine function. This represents a dynamically rendered template obtained from the server. Represents the core dataset obtained from the data service layer, rendering engine functions Execution will Fill to The corresponding placeholder is used to instantiate the chart component.
[0036] In practice, the training process of the pre-trained financial expert model involves relying on financial experts to perform high-precision labeling on the original financial texts to form a high-quality seed dataset. The original financial texts include corporate prospectuses and credit rating reports. The high-precision labels cover multiple dimensions such as risk type, risk level, and impact. Synonym replacement, back-translation, and rule-based template filling techniques are used to augment the seed dataset and expand its size. Synonym replacement uses a financial terminology thesaurus to replace specific words in the original text. Back-translation translates the text into another language and then back into the original language to generate different expressions. Rule-based template filling generates new training samples based on the logical relationship between financial data and risk statements. Essentially, the financial expert model is obtained by using the data-augmented dataset, combined with a teacher-student architecture and knowledge distillation techniques, to fine-tune the basic large-scale language model in stages. This staged fine-tuning includes first fine-tuning on a general financial text understanding task, and then further fine-tuning on a risk assessment instruction following task. Optionally, the rule-based template filling technique uses predefined risk statement sentence templates to fill in entities and values from the seed dataset into new templates to generate grammatically compliant and semantically diverse new sentences. It is understandable that knowledge distillation technology, when training financial expert models, uses the output distribution of the teacher model on complex reasoning tasks as a supervisory signal to guide the training of the student model, enabling the student model to maintain high performance while having a faster response speed.
[0037] See Figure 5In the application of Phase 3 data augmentation techniques, the correlation between the original seed data volume, the augmented data volume, and the proportion of effective samples under different augmentation techniques was simultaneously presented. Specifically, the original seed data volume remained consistent across all techniques (approximately 1000 entries), while the augmented data volume showed a significant increase with each technique iteration: synonym replacement resulted in approximately 2000 augmented data entries, back-translation augmentation increased to approximately 2100, rule template filling expanded to approximately 4800, and combined augmentation reached approximately 8500. The proportion of effective samples exhibited differentiated fluctuations: synonym replacement resulted in an effective sample proportion of 98.2%, back-translation augmentation reduced this proportion to 96.7%, rule template filling, through structured generation logic, increased the effective sample proportion back to 99.1%, while combined augmentation, due to the semantic bias risk from multiple techniques, saw the effective sample proportion fall back to 97.8%. The above data indicates that the expansion of the augmented data volume is not linearly positively correlated with the proportion of effective samples, requiring a balance between scale expansion and sample effectiveness in terms of technique adaptation.
[0038] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments that can be applied to other fields. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.
Claims
1. An AI-based intelligent risk assessment and decision support method for investment, characterized in that: The method includes: Obtain multi-source heterogeneous financial data of the target investment target, perform data source alignment and cleaning on the multi-source heterogeneous financial data, and obtain standardized time series data; The standardized time-series data is processed into structured text to generate a text-based financial report in a unified format. Based on a self-verification retrieval enhancement framework, a triple index is constructed for the textual financial report, generating a semantic index, a keyword index, and a knowledge graph index. The system receives a user's risk assessment query, parses the query through an intent routing mechanism, triggers a hybrid retrieval based on the semantic index, the keyword index, and the knowledge graph index, and obtains an initial set of knowledge blocks. The initial set of knowledge blocks is cross-validated for relevance, factual consistency, and financial compliance using a large language model, and the knowledge blocks that pass the validation are selected as security knowledge contexts. The security knowledge context and the risk assessment query are input into a pre-trained financial expert model to generate a preliminary risk assessment report; The preliminary risk assessment report is decomposed and logically verified using a hierarchical intelligent agent framework to generate the final risk assessment results and decision support suggestions.
2. The investment intelligent risk assessment and decision support method based on artificial intelligence as described in claim 1, characterized in that, The process of aligning and cleaning the multi-source heterogeneous financial data to obtain standardized time-series data includes: The numerical data and text data in the multi-source heterogeneous financial data are identified, and the missing values in the numerical data are filled in using time series interpolation, and the non-standard terms in the text data are normalized by financial dictionary mapping. The filled and normalized data are aligned according to a unified timestamp benchmark to eliminate time granularity differences between different data sources; Calculate the statistical feature values of the aligned data, identify and remove abnormal data points that deviate from the preset threshold range based on the statistical feature values, and obtain standardized time series data.
3. The investment intelligent risk assessment and decision support method based on artificial intelligence as described in claim 1, characterized in that, The self-verification retrieval enhancement framework constructs a triple index for the textual financial report, generating a semantic index, a keyword index, and a knowledge graph index, including: The textual financial report is vectorized using a semantic encoding model, and the result of the vectorization is stored as a semantic index. Key financial entities and terms are extracted from the textual financial report, a term inverted list is constructed, and a keyword index is generated. The entity relationships in the textual financial report are analyzed, and a graph structure with entities as nodes and relationships as edges is constructed to generate a knowledge graph index.
4. The investment intelligent risk assessment and decision support method based on artificial intelligence as described in claim 1, characterized in that, The initial set of knowledge blocks is cross-validated using a large language model for relevance, factual consistency, and financial compliance. The validated knowledge blocks are then selected as security knowledge contexts, including: The instruction states that the large language model determines the relevance of each initial knowledge block to the topic of the risk assessment query, outputs a relevance score, and discards initial knowledge blocks whose relevance scores are lower than a preset relevance threshold. The instruction describes a large language model that compares the consistency between the initial knowledge block and the corresponding factual data in the standardized time-series data, and marks knowledge blocks that are contradictory. The instruction states that the large language model checks the compliance of the initial knowledge block content based on a preset financial compliance rule base and marks potentially non-compliant content. Only the initial knowledge blocks that have passed the relevance judgment, consistency comparison and compliance verification are retained and combined into a security knowledge context; The steps for constructing the large language model include: A pre-training corpus was constructed by collecting massive amounts of general text data and professional text data from the financial field. Using a neural network model with a Transformer architecture as the base model, self-supervised pre-training is performed on the pre-training corpus to learn general language representations and financial domain knowledge; After pre-training, the base model is supervised fine-tuned using the instruction fine-tuning dataset to enable the model to understand and execute complex instructions related to risk assessment. The fine-tuned model is optimized using reinforcement learning techniques based on human feedback, making the model output more in line with the professional requirements and safety standards of risk assessment tasks, thus obtaining the large-scale language model.
5. The investment intelligent risk assessment and decision support method based on artificial intelligence as described in claim 1, characterized in that, The step of inputting the security knowledge context and the risk assessment query into a pre-trained financial expert model to generate a preliminary risk assessment report includes: The financial expert model is trained using a teacher-student architecture, where the teacher model is fully fine-tuned through an inference-enhanced dataset and thought chain annotations to learn the logic of financial experts. The student model reproduces the reasoning ability of the teacher model through knowledge distillation technology, and undergoes quantification and reasoning optimization processing. Based on the security knowledge context, the financial expert model infers the risk assessment query and generates a preliminary risk assessment report that includes risk dimension analysis, risk level determination, and references.
6. The investment intelligent risk assessment and decision support method based on artificial intelligence as described in claim 5, characterized in that, The process of decomposing and logically validating the preliminary risk assessment report using a hierarchical intelligent agent framework to generate final risk assessment results and decision support suggestions includes: The planner agent decomposes the analysis conclusions of the preliminary risk assessment report into multiple atomic tasks: risk validity verification, data consistency review, and decision suggestion generation. The adapter agent translates each atomic task into the call parameters of the corresponding backend data query function or rule verification function; The executor agent retrieves the latest data from a verified deterministic financial data source based on the calling parameters, and performs risk indicator recalculation and logical rule matching. The synthesizer agent integrates the recomputation results and rule matching results of the actuator agent to revise and enrich the preliminary risk assessment report, generating a final risk assessment result with confidence level labels. Based on the final risk assessment results, specific decision support suggestions are generated.
7. The investment intelligent risk assessment and decision support method based on artificial intelligence as described in claim 6, characterized in that, This also includes final output validation via a reviewer agent: The examiner agent compares the logical coherence between the final risk assessment result generated by the synthesizer agent and the preliminary risk assessment report. Verify that all data sources cited in the final risk assessment results have been verified by the self-verification retrieval enhancement framework; When a logical break or unverified data reference is detected, the relevant task is returned to the planner agent for reprocessing. The final risk assessment results and decision support recommendations will only be approved if all verification items pass.
8. The investment intelligent risk assessment and decision support method based on artificial intelligence as described in claim 1, characterized in that, This also includes optimizing the continuity of risk assessment based on the intelligent dialogue history management system: The system records the history of multiple rounds of dialogue with the user, including each risk assessment query and the corresponding final risk assessment result; Through real-time intent analysis, recent dialogue history is preserved in its original form, intermediate dialogue history is distilled into a task summary, and data pointers are used to replace the massive data objects in the security knowledge context. In subsequent risk assessments, an optimal context is dynamically constructed, which includes recent original text, interim summaries, and data pointers, and then input into the financial expert model.
9. The investment intelligent risk assessment and decision support method based on artificial intelligence as described in claim 1, characterized in that, This also includes presenting risk assessment results through a split intelligent rendering architecture: The financial expert model generates a lightweight, dynamically rendered template containing data placeholders, rather than a static report containing complete data. After the client requests the dynamic rendering template, it requests the core data in the final risk assessment result from the data service layer through an asynchronous application programming interface; After receiving the core data, the client completes the chart drawing and report rendering locally, realizing the visualization of the risk assessment results.
10. The investment intelligent risk assessment and decision support method based on artificial intelligence as described in claim 1, characterized in that, The training process of the pre-trained financial expert model includes: We rely on financial experts to perform high-precision labeling on the original financial texts to create a high-quality seed dataset. The seed dataset is augmented using synonym replacement, back translation, and rule-based template filling techniques to expand the dataset size; Using a data-augmented dataset, combined with a teacher-student architecture and knowledge distillation techniques, the basic large-scale language model is fine-tuned in stages to obtain the financial expert model.