A financial knowledge document retrieval method, device and equipment and storage medium

CN122240811APending Publication Date: 2026-06-19ZHEJIANG BANGSUN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG BANGSUN TECH CO LTD
Filing Date
2026-03-18
Publication Date
2026-06-19

Smart Images

  • Figure CN122240811A_ABST
    Figure CN122240811A_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, device, and storage medium for retrieving financial knowledge documents, relating to the field of information retrieval technology. The method includes: determining candidate text blocks corresponding to the current user's question from a financial knowledge document database based on a first semantic similarity; reordering the candidate text blocks based on a second semantic similarity between the temporal context block and the current user's question, and a preset time sensitivity parameter, to obtain reordered text blocks; and generating a retrieval answer corresponding to the current user's question based on the reordered text blocks and the temporal context block. Therefore, by considering the preceding and succeeding text blocks on the timeline of business events, and taking into account the time decay of financial knowledge, the method ensures the temporal rationality of the retrieved text blocks and the current user's question, avoiding cross-time content confusion and improving the accuracy and compliance of financial knowledge document retrieval.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of information retrieval technology, and in particular to a method, apparatus, device, and storage medium for retrieving financial knowledge documents. Background Technology

[0002] Knowledge documents in the financial field typically include regulatory compliance systems, risk control rules, business operation procedures, and their historical revisions. Documents published at different times vary in their scope of application, constraints, and implementation standards, exhibiting a clear temporal evolution characteristic. However, currently used retrieval-enhanced generation methods rely primarily on textual semantic similarity when building knowledge bases and performing searches, making it difficult to effectively distinguish between different versions of financial rule documents during the retrieval and generation stages. This problem is particularly prominent in financial knowledge question-and-answer scenarios. When a user queries a credit approval rule, anti-money laundering requirement, or compliance operation procedure, the system may simultaneously retrieve relevant text blocks of both obsolete historical versions and the currently valid latest versions. In the generation stage, without time constraints and knowledge consistency checks, issues such as the coexistence of old and new regulations, contradictory conclusions, or inconsistent descriptions can easily arise, reducing the compliance, reliability, and usability of the answers.

[0003] Therefore, how to ensure semantic relevance while achieving knowledge temporal consistency and avoiding cross-time content confusion during the retrieval enhancement generation process is a problem that needs to be solved in this field. Summary of the Invention

[0004] In view of this, the purpose of this invention is to provide a method, apparatus, device, and storage medium for retrieving financial knowledge documents. This method rearranges candidate text blocks by considering their preceding and succeeding text blocks on the timeline of business events, while also taking into account the time decay of financial knowledge. This ensures the temporal relevance of the retrieved text blocks to the current user's question, avoids cross-time content confusion, and improves the accuracy and compliance of financial knowledge document retrieval. The specific solution is as follows: Firstly, this application provides a method for retrieving financial knowledge documents, including: Candidate text blocks corresponding to the current user's question are determined from the financial knowledge document library based on the first semantic similarity. Based on the second semantic similarity between the temporal context block and the current user question, and a preset time sensitivity parameter, the candidate text blocks are reordered to obtain reordered text blocks; the temporal context block is the text block indexed by the logical pointer corresponding to the candidate text block, and the preset time sensitivity parameter is a parameter set based on the type of financial knowledge to adjust the decay rate of financial knowledge; the logical pointer is used to index the preceding and succeeding text blocks of the candidate text block on the timeline of the business event; Based on the rearranged text blocks and the temporal context blocks, the search answer corresponding to the current user's question is generated.

[0005] Optionally, the step of reordering the candidate text blocks based on the second semantic similarity between the temporal context block and the current user question, and a preset time sensitivity parameter, to obtain the reordered text blocks, includes: Determine the temporal context block corresponding to the candidate text block; Calculate the second semantic similarity between the temporal context block and the current user question, and generate a first score for the temporal context block based on the relationship between the second semantic similarity and a preset similarity threshold; The candidate text blocks are reordered based on the first semantic similarity, the second semantic similarity, the first score, and a preset time sensitivity parameter to obtain the reordered text blocks.

[0006] Optionally, generating the first score of the temporal context block based on the relationship between the second semantic similarity and a preset similarity threshold includes: If the difference between the second semantic similarity and the preset similarity threshold is positive, and the semantic scoring index corresponding to the candidate text block indicates that the semantic coherence of the candidate text block in the corresponding financial knowledge document meets the preset coherence condition, then the first score of the temporal context block is generated based on the difference and the semantic scoring index.

[0007] Optionally, the step of reordering the candidate text blocks based on the first semantic similarity, the second semantic similarity, the first score, and a preset time sensitivity parameter to obtain reordered text blocks includes: The candidate text blocks are quantitatively calculated based on a preset time sensitivity parameter to obtain the time value that represents the freshness of financial knowledge. The first semantic similarity, the second semantic similarity, the first score, and the time value are weighted and calculated, and the candidate text blocks are reordered based on the calculated second score to obtain the reordered text blocks.

[0008] Optionally, the step of quantifying the candidate text blocks based on a preset time sensitivity parameter to obtain the time value representing the freshness of financial knowledge includes: By using a Gaussian-based time decay function and a preset time sensitivity parameter to quantify the candidate text blocks, the time value representing the freshness of financial knowledge is obtained.

[0009] Optionally, generating the retrieval answer corresponding to the current user's question based on the rearranged text blocks and the temporal context blocks includes: Based on the rearranged text blocks and the corresponding temporal metadata, a prompt word template is constructed to guide the large language model in generating answers; Using the large language model and based on the prompt word template, the search answer corresponding to the current user's question is generated by the rearranged text blocks and the temporal context blocks; The time-series metadata includes the publication time of the financial knowledge document corresponding to the text block, a first logical pointer and a second logical pointer used to index the preceding and following text blocks on the timeline of the business event, the physical location number corresponding to the text block, the time of the business event, the semantic vector, and a semantic scoring index used to measure the semantic coherence of the text block in the financial knowledge document.

[0010] Optionally, generating the retrieval answer corresponding to the current user's question based on the rearranged text blocks and the temporal context blocks includes: A predetermined number of text blocks in the rearranged text blocks are determined as target text blocks in descending order of their relevance to the current user question. The search answer corresponding to the current user's question is generated based on the target text block and the target context block; the target context block is the text block in the time-series context block that corresponds to the target text block.

[0011] Secondly, this application provides a financial knowledge document retrieval device, comprising: The candidate text segmentation determination module is used to determine the candidate text segments corresponding to the current user question from the financial knowledge document library based on the first semantic similarity. The reordering module is used to reorder the candidate text blocks based on the second semantic similarity between the temporal context block and the current user question, and a preset time sensitivity parameter, to obtain reordered text blocks; the temporal context block is the text block indexed by the logical pointer corresponding to the candidate text block, and the preset time sensitivity parameter is a parameter set based on the type of financial knowledge to adjust the decay rate of financial knowledge; the logical pointer is used to index the preceding and succeeding text blocks of the candidate text block on the timeline of the business event; The answer generation module is used to generate the search answer corresponding to the current user's question based on the rearranged text blocks and the temporal context blocks.

[0012] Thirdly, this application provides an electronic device, comprising: Memory, used to store computer programs; A processor for executing the computer program to implement the financial knowledge document retrieval method described above.

[0013] Fourthly, this application provides a computer-readable storage medium for storing a computer program, which, when executed by a processor, implements the financial knowledge document retrieval method described above.

[0014] Therefore, this application first determines candidate text blocks corresponding to the current user's question from a financial knowledge document library based on a first semantic similarity; then, based on a second semantic similarity between the temporal context block and the current user's question, and a preset time sensitivity parameter, the candidate text blocks are reordered to obtain reordered text blocks; the temporal context block is the text block indexed by the logical pointer corresponding to the candidate text block, and the preset time sensitivity parameter is a parameter set based on the type of financial knowledge to adjust the decay rate of financial knowledge; the logical pointer is used to index the preceding and succeeding text blocks of the candidate text block on the timeline of business events; then, based on the reordered text blocks and the temporal context blocks, the search answer corresponding to the current user's question can be generated. In this way, during the financial knowledge document retrieval process, this application considers the cause and effect of candidate text blocks obtained from semantic similarity retrieval on the timeline of business events, specifically rearranging candidate text blocks by combining the corresponding preceding and succeeding text blocks, while also considering the time decay of financial knowledge. This ensures the reasonableness of the temporal correlation between the retrieved text blocks and the current user's question, avoids cross-time content confusion, and improves the accuracy and compliance of financial knowledge document retrieval. Attached Figure Description

[0015] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0016] Figure 1 This is a flowchart of a financial knowledge document retrieval method disclosed in this application; Figure 2 This application discloses a flowchart of a specific financial knowledge document retrieval method. Figure 3 This application discloses another specific financial knowledge document retrieval method flowchart; Figure 4 This is a flowchart of another specific financial knowledge document retrieval method disclosed in this application; Figure 5 This is a schematic diagram of the structure of a financial knowledge document retrieval device disclosed in this application; Figure 6 This is a structural diagram of an electronic device disclosed in this application. Detailed Implementation

[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0018] See Figure 1 As shown in the figure, an embodiment of the present invention discloses a method for retrieving financial knowledge documents, including: Step S11: Determine the candidate text blocks corresponding to the current user question from the financial knowledge document library based on the first semantic similarity.

[0019] In this application, the first step is to determine candidate text blocks corresponding to the current user's question from the financial knowledge document library. This process can be based on semantic similarity. It is understood that the text blocks corresponding to financial knowledge in the financial knowledge document library all have semantic vectors. Then, considering the similarity between these semantic vectors and the question vector corresponding to the current user's question, several text blocks can be determined from the financial knowledge document library, which are denoted as candidate text blocks.

[0020] Step S12: Based on the second semantic similarity between the temporal context block and the current user question, and a preset time sensitivity parameter, the candidate text blocks are reordered to obtain reordered text blocks; the temporal context block is the text block indexed by the logical pointer corresponding to the candidate text block, and the preset time sensitivity parameter is a parameter set based on the type of financial knowledge to adjust the decay rate of financial knowledge; the logical pointer is used to index the preceding and following text blocks of the candidate text block on the timeline of the business event.

[0021] Furthermore, the above steps yield candidate text blocks corresponding to the current user's question in the financial knowledge document library. To ensure the temporal relevance of the financial knowledge, these candidate text blocks can be reordered. Specifically, considering the timeline of the business events corresponding to the candidate text blocks, the preceding and subsequent text blocks on their timelines are determined, denoted as temporal context blocks. Then, based on the semantic similarity between the temporal context blocks and the current user's question, combined with a time sensitivity parameter, the candidate text blocks are reordered. It is understandable that the time sensitivity parameter used here is pre-set according to the type of financial knowledge to adjust the rate at which the importance of financial knowledge decays over time, i.e., to measure the value of financial knowledge at different times. It should be noted that the temporal context blocks here refer to the preceding and subsequent text blocks of the candidate text blocks on the timeline of their corresponding business events, i.e., considering the cause and effect of the candidate text blocks. In other words, the reordering process here takes into account the similarity between the cause and effect of the candidate text blocks on the timeline and the current user question, and also considers the decay value of the financial knowledge corresponding to the candidate text blocks over time, in order to reorder the candidate text blocks; this makes the reordered text blocks more reasonable in terms of time and more relevant to the current user question in terms of content.

[0022] In one specific embodiment, reordering the candidate text blocks based on a second semantic similarity between the temporal context block and the current user question, and a preset time sensitivity parameter, to obtain reordered text blocks, may include: determining the temporal context block corresponding to the candidate text block; calculating the second semantic similarity between the temporal context block and the current user question, and generating a first score for the temporal context block based on the relationship between the second semantic similarity and a preset similarity threshold; and reordering the candidate text blocks based on the first semantic similarity, the second semantic similarity, the first score, and the preset time sensitivity parameter to obtain reordered text blocks. Specifically, after calculating the semantic similarity between the temporal context block and the current user question, the temporal context block can be further filtered based on the relationship between the semantic similarity and the similarity threshold, and a first score representing the importance of the temporal context block to the candidate text block can be generated. Then, the candidate text blocks can be reordered based on the first semantic similarity (i.e., the semantic similarity between the candidate text block and the current user's question), the second semantic similarity (the semantic similarity between the temporal context block and the current user's question), the first score, and the time sensitivity parameter. This ensures that the reordered text blocks are more temporally reasonable, prevents misinterpretation, and are more relevant to the current user's question in terms of content.

[0023] Step S13: Based on the rearranged text blocks and the temporal context blocks, generate the search answer corresponding to the current user question.

[0024] In this application, the candidate text blocks are reordered by combining the semantic similarity between the temporal context blocks and the current user question, as well as the time sensitivity parameter, through the above steps. Then, the retrieval answer corresponding to the current user question is generated based on the reordered text blocks and the relevant temporal context blocks. It can be understood that combining the reordered text blocks and the temporal context blocks ensures that the text blocks used to generate the retrieval answer are blocks within a complete business event, which has a higher confidence level compared to fragmented descriptive text blocks.

[0025] In one specific embodiment, generating a search answer corresponding to the current user's question based on the rearranged text blocks and the temporal context blocks may include: constructing a prompt word template to guide the large language model in generating the answer based on the rearranged text blocks and corresponding temporal metadata; generating a search answer corresponding to the current user's question using the large language model and based on the prompt word template using the rearranged text blocks and the temporal context blocks; wherein, the temporal metadata includes the publication time of the financial knowledge document corresponding to the text block, a first logical pointer and a second logical pointer for indexing the preceding and following text blocks on the timeline of the business event, the physical location number corresponding to the text block, the time of the business event, a semantic vector, and a semantic scoring index for measuring the semantic coherence of the text block in the financial knowledge document. It should be noted that this application defines temporal metadata for each text block, combining the effective / release time of financial knowledge documents with the time of business events in a unified model, and defining key information such as time dimension and physical location into the temporal metadata model. This temporal metadata model not only records "when the knowledge is effective" but also retains its temporal context in compliance processes or risk control reports, which can solve the fundamental problem of lost temporal information and broken context after financial knowledge documents are segmented. Furthermore, in the process of generating search answers, prompt word templates with time consistency constraints can be generated by combining the rearranged text blocks and the corresponding temporal metadata. This guides the large language model to follow the chronological order of financial knowledge when generating search answers, avoiding cross-time knowledge fusion. Subsequently, when generating search answers through the large language model, the temporal context blocks corresponding to the rearranged text blocks are also considered to ensure the accuracy and interpretability of the final generated answers.

[0026] In a specific embodiment, time-series metadata is obtained by constructing a "time-series doubly linked index" structure and defining key information such as time dimension and physical location into the time-series metadata model. Among these, the absolute time dimension... This refers to the knowledge document time, which records the point in time when the knowledge became effective, was revised, or was published. The system receives original financial knowledge documents (such as credit regulations, approval processes, etc.) and obtains their corresponding knowledge document time information. As a fundamental time attribute of a document, including effective date, revision date, or entry date, this time serves as a global benchmark for the freshness of the knowledge document and will be subsequently used to calculate adaptive nonlinear time decay. Physical order dimension. and the time dimension of business events To address the rigid physical arrangement of original documents, a temporal topology structure based on the order of business events is constructed. First, the knowledge document is sequentially segmented to generate a preliminary set of text chunks. And for each block Assign globally unique and monotonically increasing values The physical location, with its numbering strictly reflecting the chronological order of document content within the risk handling process or compliance logic; then, natural language processing technology is used to process each block of content and extract the time of business events from the text blocks. For example, the operation time in the risk operation record, the approval time in the credit approval process (2026-01-02 12:10:05 Company A applied for a 5 million yuan working capital loan...), and this time is converted into a timestamp format: 1767327005000. If divided into blocks... If no specific time is specified, a time-series smoothing mechanism is activated, based on the physical order of the original document. Tracing back to the most recent one with The time-series anchor points are divided into blocks, and by Inherit the business event time , and record as Corresponding business event time Simultaneously, it records its inheritance hierarchy to ensure that the logical temporal sequence can still be anchored in text without time stamps. Furthermore, a doubly linked index based on business temporal sequence (preceding text blocks) is used. and subsequent text chunking The construction requires all blocks to be processed according to the business event times extracted or inherited from them. Perform a global linear rearrangement. Define time-series metadata for each block based on the rearranged order. Time-series doubly linked index ;Right now Points to the set of preceding blocks that are immediately adjacent to the current text block on the business logic timeline. This points to the set of immediately following chunks on the timeline, rather than the physical location context chunks in the original document. This allows isolated text to be reconstructed into a chain structure with "business causal / sequential" relationships. It's important to note that temporal context refers to the global document reordering of chunks in both chronological and physical location order after extracting business event times from text chunks. The chronological order defaults to the times extracted from the text chunks, while the physical location order is determined by a temporal smoothing strategy used when there is no time information in the text chunks cut according to document content order. This strategy retrieves the business event times from the previous text chunk and records the corresponding inherited hierarchy, thus providing more reasonable temporal information. This temporal information is stored in a doubly linked index for each text chunk. The temporal context, on the other hand, refers to the preceding and succeeding specific text chunk content in the global chronological order for the current target text chunk.

[0027] Furthermore, the scoring of the temporal context association vector (i.e., the semantic scoring metric) requires finding the current text block through a doubly linked index. corresponding vector Its default context And calculate their relevant semantic score metrics. : ; This metric measures whether the current text chunk is semantically coherent within the document. If A high value indicates that the text segment is highly correlated with its corresponding temporal context, resulting in a significant gain for retrieving the answer; if The low value indicates that a shift in the topic may have occurred.

[0028] Understandably, each text chunk possesses temporal metadata, a structured data object stored alongside the text chunk, representing the text content's attributes in the time dimension and its temporal relationships within the original document's logical linear topology. Furthermore, unlike traditional static tags, temporal metadata also features a dynamic descriptor with a doubly linked index, allowing the retrieval of the text chunk's temporal context. This design aims to resolve conflicts in knowledge timeliness and contextual logical discontinuities during the retrieval enhancement generation process. The temporal metadata model can be a seven-tuple, as shown below: ; Among them, the physical order dimension This identifies the original position number of the text block within the original compliance document or risk report, ensuring the strict preservation of the original document's logical order and chronological structure during knowledge retrieval and generation. (Absolute time dimension) This refers to the knowledge document time, which records the time when the knowledge becomes effective, is revised, or is published, providing a benchmark for subsequent adaptive nonlinear time decay calculations. (Business event time dimension) In other words, the specific timestamps extracted from each text block reflect its chronological event processing within the knowledge document. (Text block vector dimension) This refers to the vectorized content of text chunks through an embedding model. It is a multi-dimensional numerical value used for calculating semantic relevance. The doubly linked index relates to this dimension. It reconstructs isolated text blocks into a chain-like structure with cause-and-effect relationships using pre-defined logical pointers. (Time-series correlation vector dimension) The logical coherence between the characterization block and its neighborhood is characterized by semantic coherence.

[0029] In one specific embodiment, generating a search answer corresponding to the current user question based on the rearranged text blocks and the temporal context blocks may include: determining a predetermined number of text blocks in the rearranged text blocks as target text blocks in descending order of their relevance to the current user question; generating a search answer corresponding to the current user question based on the target text blocks and the target context blocks; wherein the target context blocks are the text blocks in the temporal context blocks that correspond to the target text blocks. Specifically, the text blocks used in generating the search answer may be a portion of the rearranged text blocks, and a certain number of text blocks are selected to generate the search answer in descending order of their relevance to the current user question; this further ensures the accuracy and interpretability of the generated answer.

[0030] Therefore, this application defines a seven-tuple temporal metadata for each text block during the financial knowledge document retrieval process, and models the document's effective / published time in a unified manner with the business event time. This temporal metadata model not only records "when the knowledge is effective," but also retains its temporal context in compliance processes or risk control reports, solving the fundamental problem of lost temporal information and broken context after financial document segmentation. Currently, most retrieval technologies use "fixed window slicing" or "sliding window expansion," which is a blind assumption of physical proximity, i.e., forcibly taking the preceding and following N blocks as context. In financial documents (such as long audit reports or mixed approval workflows), key causal evidence is often broken due to segmentation, or mixed with a large amount of irrelevant physical neighbor noise, resulting in a logically broken generated answer. This application, by extracting business event time and further constructing a bidirectional chained index structure, achieves a leap from physical space to logical temporal space; it can better identify which blocks belong to the same temporal context, thereby improving the information integrity of the overall answer. Understandably, by rearranging the preceding and succeeding text blocks corresponding to the candidate text blocks, and taking into account the time decay of financial knowledge, the reasonableness of the temporal correlation between the retrieved text blocks and the current user's question can be ensured, thus avoiding cross-time content confusion and improving the accuracy and compliance of financial knowledge document retrieval.

[0031] The following section will introduce the relevant content of text block reordering, such as... Figure 2 As shown, it specifically includes: Step S21: Determine the temporal context block corresponding to the candidate text block.

[0032] In this embodiment, the temporal metadata of a candidate text block includes its corresponding temporal context block. It is understood that the construction process of the temporal metadata considers the timeline of the business events corresponding to the financial knowledge text block, and uses the immediately preceding and succeeding text blocks on the timeline as the temporal context blocks corresponding to that financial knowledge text block. In other words, the temporal metadata constructs a doubly linked index based on the temporal sequence of business time. Through preset logical pointers, isolated financial knowledge text blocks can be reconstructed into a chain structure with a "cause-effect" relationship; that is, a single financial knowledge text block corresponds to two logical pointers used to index the immediately preceding and succeeding text blocks on the timeline of its business events. Furthermore, the corresponding temporal context block can be determined through the logical pointers corresponding to the candidate text block.

[0033] Step S22: Calculate the second semantic similarity between the temporal context block and the current user question.

[0034] Furthermore, the re-ranking process needs to consider the semantic similarity between the temporal context blocks and the current user question, which is used to weight relevant candidate text blocks. Understandably, a candidate text block is important not only because it matches the current user question, but also because its logical chain (the logical timeline of the business event) is highly consistent with the user question. Considering the semantic similarity between the temporal context and the current user question in this way avoids misinterpretation and ensures the accuracy of the final retrieval answer.

[0035] Step S23: If the difference between the second semantic similarity and the preset similarity threshold is positive, and the semantic scoring index corresponding to the candidate text block indicates that the semantic coherence of the candidate text block in the corresponding financial knowledge document meets the preset coherence condition, then the first score of the temporal context block is generated based on the difference and the semantic scoring index.

[0036] It should be noted that the current temporal context block is only considered for inclusion in the reordering step of its corresponding candidate text block when the second semantic similarity between the temporal context block and the current user question exceeds a certain threshold. Specifically, by combining the relationship between the second semantic similarity and the preset similarity threshold, and the first semantic similarity between the candidate text block and the current user question, a reordering score, i.e., the first score, can be generated for the corresponding candidate text block. Specifically, the calculation process considers two cases: if the second semantic similarity is not greater than the corresponding threshold, the score is directly 0; if the second semantic similarity is greater than the corresponding threshold, the difference between the second semantic similarity and the corresponding similarity threshold is considered, and a score is generated by combining it with the semantic scoring index of the corresponding candidate text block, denoted as the first score.

[0037] Step S24: Quantify the candidate text blocks based on preset time sensitivity parameters to obtain the time value that represents the freshness of financial knowledge.

[0038] Correspondingly, the re-ranking also considers the timeliness of financial knowledge, that is, the impact of time changes on the importance of financial knowledge. Here, a time sensitivity parameter is pre-set for each type of financial knowledge. For two different types of information—"interest rate adjustments and risk warnings" and "basic laws and regulations and industry terminology"—the rate of decay of knowledge value differs, meaning the corresponding time sensitivity parameters are different. Furthermore, based on the knowledge type of the candidate text blocks, the corresponding time sensitivity parameter can be used to quantify the time value representing the freshness of financial knowledge.

[0039] In a specific embodiment, a Gaussian-based time decay function, combined with a preset time sensitivity parameter, can be used to quantify the candidate text blocks and obtain the time value representing the freshness of financial knowledge. It is understood that the exponential function can construct an inverted bell curve (i.e., a Gaussian distribution), thus introducing nonlinearity and more realistically simulating the accelerated decay of financial knowledge over time.

[0040] Step S25: Perform a weighted calculation on the first semantic similarity, the second semantic similarity, the first score, and the time value, and reorder the candidate text blocks based on the calculated second score to obtain the reordered text blocks.

[0041] Then, a weighted calculation can be performed on the first semantic similarity, the second semantic similarity, and the calculated time value; the relevant weight coefficients represent the basic semantic relevance, the timeliness weight, and the potential for temporal context relevance, respectively. For example, a risk case that, although older, is extremely complete in the financial logic chain and highly coordinated with the context can still stand out in the score through weighted calculation.

[0042] Therefore, this application introduces the semantic similarity between the candidate text block and the temporal context block corresponding to the user question, as well as the semantic scoring index representing the semantic coherence of the candidate text block, during the re-ranking process, and considers adaptive nonlinear time decay. Understandably, scoring is only involved when the neighborhood block (temporal context block) is logically highly integrated with the current business process, thus achieving dynamic context expansion on demand. Re-ranking that integrates multiple considerations such as time decay can improve the sensitivity and robustness of recognizing new and old knowledge, and enhance the rationality and accuracy of retrieval.

[0043] like Figure 3As shown, this embodiment provides a specific method for retrieving financial knowledge documents, involving steps such as document access and time information acquisition, text segmentation and temporal context construction, semantic vector generation and semantic similarity retrieval, re-ranking, and answer generation; the following will combine... Figure 4 The flowchart illustrating the financial knowledge document retrieval method details the two parts: index creation and retrieval enhancement. Specifically, it includes: The first step is the index construction process based on a time-series metadata model. It's important to note that the core of the index construction phase lies in moving beyond simply storing isolated text blocks. Instead, it involves building a "time-series doubly linked index" structure, defining key information such as the time dimension and physical location within the time-series metadata model, generating time-series metadata corresponding to each text block, and storing them in association. Specific details regarding time-series metadata can be found in the above embodiments and will not be repeated here.

[0044] Then, upon receiving the user's question, semantic retrieval is performed on the text blocks using text vector indexing; specifically, for each text block... Generate the corresponding semantic vector And generate question vectors for user questions. Based on semantic similarity It retrieves a set of candidate text blocks with high similarity from a vector database.

[0045] The next step is a multi-factor re-ranking that integrates semantic similarity, temporal context gain, and time decay. For the candidate text blocks initially recalled semantically, their semantic similarity scores are combined with temporal metadata. The temporal context and time information in the text are deeply reordered. The gain factor of the temporal context (i.e., the gain effect of temporal context segmentation on candidate text segments) is considered; specifically, for each text segment in the initially recalled text segment set... Utilizing the doubly linked index in its time-series metadata and Retrieve its direct context block vector and perform gain factor calculation: 1) Calculate the vector corresponding to the time-series context block and the user query vector. Semantic similarity: ; 2) Threshold-based dynamic gain calculation for "high retention, low rejection". A gating threshold is introduced. Only when the relevance of the neighbors (temporal context) to the user's problem exceeds... Only when the time context is specified will the weighting of the central block (candidate text block) be activated: ; Here it is introduced Characterize the semantic coherence between candidate text blocks and their corresponding temporal contexts; even if the temporal context is related to the user's question, if the temporal context has a logical connection with the candidate text block in the original financial knowledge document ( The gain is very weak, and it will be suppressed. This ensures that the extended context has native logical consistency.

[0046] It's important to note that the temporal context gain factor (hereinafter referred to as "gain factor") refers to the quantitative feedback generated during the retrieval re-ranking stage based on the semantic similarity strength between the user question and the neighborhood of candidate text blocks (temporal context blocks). Simply put, a block is important not only because it matches the user question, but also because the overall logical chain it belongs to highly aligns with the user question. Understandably, the gain factor optimizes the ranking of text blocks within a complete logical chain by identifying the semantic relevance of temporal context blocks. It's also worth noting that during the re-ranking stage, a "temporal consistency probe" can be performed on candidate blocks. Only blocks that satisfy the business logic order on the timeline (precedence preceding subsequent events) can participate in the scoring calculation through the gain factor.

[0047] Furthermore, considering the impact of time on the timeliness of financial knowledge, an adaptive nonlinear time decay calculation is performed. To address the timeliness conflicts in financial risk control document knowledge, a time decay function based on a Gaussian distribution can be introduced. Used to quantify the freshness value of knowledge: ; in, Time-series metadata tuple The timestamps extracted represent the document timestamps corresponding to the candidate text blocks; This is the current query timestamp, representing "now" or "the moment the transaction occurred." In financial risk control, it is usually the second the system initiates the search, serving as the origin for measuring the freshness of all knowledge. This is an adaptive time sensitivity parameter. It can dynamically adjust based on document type. (For example: regarding "instantaneous fluctuation" risks,) Smaller size, extremely rapid decay; for "basic legal" documents, The difference is relatively large, ensuring that old knowledge still has reference value. The exp (exponential function) transforms the squared difference into a smooth inverted bell-shaped curve (Gaussian distribution). The exponential function exp is used to introduce a non-linear logical mapping. Compared to linear calculations, exp can more realistically simulate the accelerated failure of financial knowledge over time and ensure that the weighted score always remains in the positive range of (0, 1], thereby improving the sensitivity and robustness of identifying both old and new financial knowledge.

[0048] It's important to note that regarding time decay, simply put, in financial risk control scenarios, regulatory policies and business rules from 2025 are definitely more important than those from 2010. Time decay means that during retrieval, newer documents are assigned higher scores, and older documents are assigned lower scores. Secondly, non-linear time decay refers to the fact that the obsolescence of financial knowledge is not uniform; within a certain period (such as six months after a policy is released), the value of knowledge remains well maintained, with almost no drop in score. Finally, adaptive non-linear time decay means that the rate of value decay varies among different types of financial knowledge. Specifically, the decay rate can be automatically adjusted according to the different types of financial knowledge. For example, for information such as "interest rate adjustments" and "risk warnings," new information released today renders yesterday's information useless, decaying very quickly, so only the latest information is considered. However, for information such as "basic laws and regulations" and "industry-standard terminology," even information from five years ago remains valid and can be adjusted accordingly. This makes its time decay slower.

[0049] Furthermore, a comprehensive score (i.e., the second score) for candidate text blocks can be calculated by combining semantic similarity (the first semantic similarity between the candidate text block and the user question, and the second semantic similarity between the temporal context block and the user question), temporal context gain factor, and time decay; the specific formula is as follows: ; in, , , These are preset weighting coefficients, representing basic semantic relevance, timeliness weight, and temporal context relevance potential, respectively. It is important to emphasize that... and It's a synergistic effect. For example: one that takes a little longer ( Slightly lower) but extremely complete in the financial logic chain and highly coordinated with the context ( Even cases with extremely high risk can still stand out through comprehensive scoring.

[0050] To improve the accuracy of the retrieved answers, we can select contextual expansions for candidate text blocks, using the corresponding temporal context blocks as extended text blocks. Specifically, based on the top-scoring text blocks in the rearranged text blocks, we can determine whether they contain contextual block content that meets a threshold condition. If so, the corresponding contextual blocks are used as important information expansions for the candidate text blocks. This ultimately yields a set of text blocks (including the rearranged text blocks and their corresponding temporal context blocks) that are semantically and temporally relevant to the user's question.

[0051] Furthermore, in the process of generating search answers, it is first necessary to construct a prompt word template to guide the large language model in generating answers, and then use the prompt word template to generate the final search results. Specifically, when constructing the prompt word template, the rearranged text blocks and their corresponding temporal metadata are combined to create a prompt word template with temporal consistency. This guides the large language model to follow the knowledge time order when generating answers, avoiding the mixing of cross-document and cross-temporal knowledge, and improving the accuracy and interpretability of the answers. In a specific embodiment, during the process of generating the prompt word template, the logical pointers provided by the temporal metadata can also be considered. Based on the rearranged text blocks and their corresponding temporal context blocks, as well as the relevant temporal metadata, the prompt word template is constructed to ensure that the content of the pulled prompt words has native temporal coherence, thereby improving the overall logicality of the question-and-answer information. It should be noted that the prompt words can instruct the large language model to follow the following principles during the generation process: 1) Understand and reason based on the chronological order of the provided text chunks; 2) When a newer text chunk modifies, replaces, or conflicts with the content of an earlier text chunk, only the most recent valid conclusion should be retained; 3) It is prohibited to output rules, processes, or conclusions from different time stages in parallel or mixed; 4) If multiple retrieved chunks are valid for the question, the answer should be output in the form of a chronological progression.

[0052] Therefore, a temporal context gain factor is introduced during the re-ranking stage. Only adjacent blocks that are highly relevant to the user query and have strong logical coherence with the central block participate in the scoring calculation. Unlike the indiscriminate splicing of neighboring blocks, this approach effectively prevents misinterpretation (such as rule references lacking preconditions), ensuring that the content possesses both semantic relevance and original logical consistency. This is particularly suitable for scenarios with strong process dependencies, such as credit approval and anti-money laundering. Furthermore, this application introduces the modulating effect of "contextual logic strength" on time weights during the re-ranking stage, ensuring that high-scoring results are both timely and logically complete. Currently, common methods either use only semantic similarity ranking (ignoring time) or linearly superimpose time as an independent weight, failing to consider the core financial risk control requirement that "a slightly older rule within a complete business chain may be more valuable than an isolated new fragment." Further, this application introduces an adaptive nonlinear time decay function, using an exponential function to simulate the accelerated invalidation of financial knowledge over time, greatly improving the sensitivity and robustness of identifying old and new knowledge; and dynamically adjusting the adaptive sensitivity parameter... This approach enables differentiated management of "interest rate adjustments" (rapid decay) and "basic laws" (slow decay), ensuring the timeliness and accuracy of search results. This reordering, which considers temporal context gain factors, semantic relevance, and time decay, makes the retrieval more comprehensive, thereby improving the accuracy of the search results. It should be noted that currently common retrieval methods in this field often only recall fragmented segments based on semantic similarity before generating answers. Due to a lack of awareness of the actual business time sequence between segments, it is easy to provide information that occurs after the "result" as the "cause" to the large model (e.g., mistaking "collection records after default" for "risk access basis before loan approval"). This confusion in temporal logic can lead to the large model generating erroneous conclusions with reversed causality, posing a significant security risk in rigorous financial approval and auditing scenarios. This application, however, provides a temporal consistency verification mechanism by combining temporal metadata during reordering, ensuring causal alignment between the search results and the generated logic.

[0053] like Figure 5 As shown in the figure, this embodiment discloses a financial knowledge document retrieval device, including: The candidate text segmentation determination module 11 is used to determine the candidate text segmentation corresponding to the current user question from the financial knowledge document library based on the first semantic similarity. The reordering module 12 is used to reorder the candidate text blocks based on the second semantic similarity between the temporal context block and the current user question, and a preset time sensitivity parameter, to obtain reordered text blocks; the temporal context block is the text block indexed by the logical pointer corresponding to the candidate text block, and the preset time sensitivity parameter is a parameter set based on the type of financial knowledge to adjust the decay rate of financial knowledge; the logical pointer is used to index the preceding and succeeding text blocks of the candidate text block on the timeline of the business event; The answer generation module 13 is used to generate the search answer corresponding to the current user question based on the rearranged text blocks and the temporal context blocks.

[0054] Therefore, this application considers the cause and effect of candidate text blocks obtained by semantic similarity retrieval on the timeline of business events during the financial knowledge document retrieval process. Specifically, it rearranges the candidate text blocks by combining the corresponding preceding and succeeding text blocks, while also considering the time decay of financial knowledge. This ensures the reasonableness of the temporal correlation between the retrieved text blocks and the current user's question, avoids cross-time content confusion, and improves the accuracy and compliance of financial knowledge document retrieval.

[0055] In one specific embodiment, the reordering module 12 may include: A temporal context block determination unit is used to determine the temporal context block corresponding to the candidate text block; The scoring generation submodule is used to calculate the second semantic similarity between the temporal context block and the current user question, and to generate the first score of the temporal context block according to the relationship between the second semantic similarity and a preset similarity threshold. The reordering submodule is used to reorder the candidate text blocks based on the first semantic similarity, the second semantic similarity, the first score, and a preset time sensitivity parameter to obtain reordered text blocks.

[0056] In another specific embodiment, the score generation submodule may include: The scoring unit is used to generate a first score for the temporal context block based on the difference and the semantic scoring index when the difference between the second semantic similarity and the preset similarity threshold is positive, and the semantic scoring index corresponding to the candidate text block represents the semantic coherence of the candidate text block in the corresponding financial knowledge document meets the preset coherence conditions.

[0057] In yet another specific embodiment, the reordering submodule may include: The time value calculation unit is used to perform quantitative calculations on the candidate text blocks based on a preset time sensitivity parameter to obtain the time value that represents the freshness of financial knowledge. The reordering unit is used to perform weighted calculations on the first semantic similarity, the second semantic similarity, the first score, and the time value, and reorder the candidate text blocks based on the calculated second score to obtain reordered text blocks.

[0058] In one specific embodiment, the time value calculation unit is specifically used to: use a time decay function based on Gaussian distribution, combined with a preset time sensitivity parameter, to perform quantitative calculation on the candidate text blocks to obtain the time value characterizing the freshness of financial knowledge.

[0059] In one specific embodiment, the answer generation module 13 may include: The prompt word template generation unit is used to construct prompt word templates to guide the large language model to generate answers based on the rearranged text blocks and the corresponding temporal metadata. The first retrieval answer generation unit is used to generate the retrieval answer corresponding to the current user question by using the large language model and based on the prompt word template, the rearranged text blocks and the temporal context blocks. The time-series metadata includes the publication time of the financial knowledge document corresponding to the text block, a first logical pointer and a second logical pointer used to index the preceding and following text blocks on the timeline of the business event, the physical location number corresponding to the text block, the time of the business event, the semantic vector, and a semantic scoring index used to measure the semantic coherence of the text block in the financial knowledge document.

[0060] In another specific embodiment, the answer generation module 13 may include: The target text block determination unit is used to determine a preset number of text blocks in the rearranged text blocks as target text blocks in descending order of their relevance to the current user question; The first retrieval answer generation unit is used to generate a retrieval answer corresponding to the current user's question based on the target text block and the target context block; the target context block is the text block in the temporal context block that corresponds to the target text block.

[0061] Furthermore, embodiments of this application also disclose an electronic device, Figure 6 This is a structural diagram of an electronic device 20 according to an exemplary embodiment. The content of the diagram should not be construed as limiting the scope of this application.

[0062] Figure 6 This is a schematic diagram of the structure of an electronic device 20 provided in an embodiment of this application. Specifically, the electronic device 20 may include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input / output interface 25, and a communication bus 26. The memory 22 stores a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the financial knowledge document retrieval method disclosed in any of the foregoing embodiments. Alternatively, the electronic device 20 in this embodiment may specifically be an electronic computer.

[0063] In this embodiment, the power supply 23 is used to provide operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be any communication protocol applicable to the technical solution of this application, and is not specifically limited here; the input / output interface 25 is used to acquire external input data or output data to the outside world, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.

[0064] In addition, the memory 22, as a carrier for resource storage, can be a read-only memory, random access memory, disk or optical disk, etc. The resources stored thereon can include operating system 221, computer program 222, etc., and the storage method can be temporary storage or permanent storage.

[0065] The operating system 221 is used to manage and control the various hardware devices on the electronic device 20 and the computer program 222, which may be Windows Server, Netware, Unix, Linux, etc. In addition to including a computer program capable of performing the financial knowledge document retrieval method executed by the electronic device 20 as disclosed in any of the foregoing embodiments, the computer program 222 may further include computer programs capable of performing other specific tasks.

[0066] Furthermore, this application also discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the aforementioned disclosed financial knowledge document retrieval method. Specific steps of this method can be found in the corresponding content disclosed in the foregoing embodiments, and will not be repeated here.

[0067] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.

[0068] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0069] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.

[0070] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0071] The technical solutions provided in this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only intended to help understand the methods and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A method for retrieving financial knowledge documents, characterized in that, include: Candidate text blocks corresponding to the current user's question are determined from the financial knowledge document library based on the first semantic similarity. Based on the second semantic similarity between the temporal context block and the current user question, and a preset time sensitivity parameter, the candidate text blocks are reordered to obtain the reordered text blocks; The temporal context block is the text block indexed by the logical pointer corresponding to the candidate text block. The preset time sensitivity parameter is a parameter set based on the type of financial knowledge to adjust the decay rate of financial knowledge. The logical pointer is used to index the preceding and following text blocks of the candidate text block on the timeline of the business event. Based on the rearranged text blocks and the temporal context blocks, the search answer corresponding to the current user's question is generated.

2. The financial knowledge document retrieval method according to claim 1, characterized in that, The process of reordering the candidate text blocks based on the second semantic similarity between the temporal context blocks and the current user question, and using a preset time sensitivity parameter, to obtain reordered text blocks includes: Determine the temporal context block corresponding to the candidate text block; Calculate the second semantic similarity between the temporal context block and the current user question, and generate a first score for the temporal context block based on the relationship between the second semantic similarity and a preset similarity threshold; The candidate text blocks are reordered based on the first semantic similarity, the second semantic similarity, the first score, and a preset time sensitivity parameter to obtain the reordered text blocks.

3. The financial knowledge document retrieval method according to claim 2, characterized in that, The step of generating the first score for the temporal context block based on the relationship between the second semantic similarity and a preset similarity threshold includes: If the difference between the second semantic similarity and the preset similarity threshold is positive, and the semantic scoring index corresponding to the candidate text block indicates that the semantic coherence of the candidate text block in the corresponding financial knowledge document meets the preset coherence condition, then the first score of the temporal context block is generated based on the difference and the semantic scoring index.

4. The financial knowledge document retrieval method according to claim 2, characterized in that, The step of reordering the candidate text blocks based on the first semantic similarity, the second semantic similarity, the first score, and a preset time sensitivity parameter to obtain reordered text blocks includes: The candidate text blocks are quantitatively calculated based on a preset time sensitivity parameter to obtain the time value that represents the freshness of financial knowledge. The first semantic similarity, the second semantic similarity, the first score, and the time value are weighted and calculated, and the candidate text blocks are reordered based on the calculated second score to obtain the reordered text blocks.

5. The financial knowledge document retrieval method according to claim 4, characterized in that, The quantitative calculation of the candidate text blocks based on a preset time sensitivity parameter to obtain the time value representing the freshness of financial knowledge includes: By using a Gaussian-based time decay function and a preset time sensitivity parameter to quantify the candidate text blocks, the time value representing the freshness of financial knowledge is obtained.

6. The financial knowledge document retrieval method according to any one of claims 1 to 5, characterized in that, The step of generating the retrieval answer corresponding to the current user's question based on the rearranged text blocks and the temporal context blocks includes: Based on the rearranged text blocks and the corresponding temporal metadata, a prompt word template is constructed to guide the large language model in generating answers; Using the large language model and based on the prompt word template, the search answer corresponding to the current user's question is generated by the rearranged text blocks and the temporal context blocks; The time-series metadata includes the publication time of the financial knowledge document corresponding to the text block, a first logical pointer and a second logical pointer used to index the preceding and following text blocks on the timeline of the business event, the physical location number corresponding to the text block, the time of the business event, the semantic vector, and a semantic scoring index used to measure the semantic coherence of the text block in the financial knowledge document.

7. The financial knowledge document retrieval method according to any one of claims 1 to 5, characterized in that, The step of generating the retrieval answer corresponding to the current user's question based on the rearranged text blocks and the temporal context blocks includes: A predetermined number of text blocks in the rearranged text blocks are determined as target text blocks in descending order of their relevance to the current user question. The search answer corresponding to the current user's question is generated based on the target text block and the target context block; the target context block is the text block in the time-series context block that corresponds to the target text block.

8. A financial knowledge document retrieval device, characterized in that, include: The candidate text segmentation determination module is used to determine the candidate text segments corresponding to the current user question from the financial knowledge document library based on the first semantic similarity. The reordering module is used to reorder the candidate text blocks based on the second semantic similarity between the temporal context block and the current user question, as well as a preset time sensitivity parameter, to obtain the reordered text blocks. The temporal context block is the text block indexed by the logical pointer corresponding to the candidate text block. The preset time sensitivity parameter is a parameter set based on the type of financial knowledge to adjust the decay rate of financial knowledge. The logical pointer is used to index the preceding and following text blocks of the candidate text block on the timeline of the business event. The answer generation module is used to generate the search answer corresponding to the current user's question based on the rearranged text blocks and the temporal context blocks.

9. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the financial knowledge document retrieval method as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, Used to store computer programs, which, when executed by a processor, implement the financial knowledge document retrieval method as described in any one of claims 1 to 7.