Knowledge graph quality evaluation-based large model generation link control method
By generating and executing structured query statements, calculating symbolic and neural side scores, and employing a gated dual-path fusion mechanism, this approach solves the problem of quantitatively linking graph state changes with the support capabilities of generation links in knowledge graph quality assessment methods. This achieves a unified and comparable expression of support capabilities and improves the interpretability of assessment results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SICHUAN UNIV JINCHENG INST
- Filing Date
- 2026-05-11
- Publication Date
- 2026-06-12
AI Technical Summary
Existing knowledge graph quality assessment methods struggle to establish a quantitative correlation between graph state changes and the supporting capabilities of generation links, lack a unified and comparable expression method, cannot independently characterize graph contributions, and have insufficient applicability of assessment results in complex task scenarios and under non-ideal conditions.
By generating and executing structured query statements, obtaining query results, calculating symbolic and neural side scores, and employing a gated dual-path fusion mechanism, a unified scoring framework is formed, enhancing the interpretability and applicability of the evaluation results.
It achieves a quantitative correlation between changes in the state of the knowledge graph and the supporting capabilities of the generated tasks, forming a unified and comparable expression of supporting capabilities, which improves the interpretability and applicability of the evaluation results, especially the evaluation effect under complex tasks and non-ideal conditions.
Smart Images

Figure CN122198149A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of artificial intelligence and natural language processing technology, and in particular to a method for controlling the generation of large models based on knowledge graph quality assessment. Background Technology
[0002] Knowledge graph quality assessment is a fundamental step in the construction, maintenance, and application of knowledge graphs. It aims to quantitatively describe the quality status of knowledge graphs and guide graph selection, completion, and downstream applications. Existing assessment techniques mainly include: structured assessment methods focusing on topological coverage and connectivity; methods relying on pattern constraints and logical rules for consistency checks; task-driven assessment methods that indirectly reflect graph quality through downstream tasks such as question answering, retrieval, and reasoning; learning-based assessment methods based on embedding representations, graph neural networks, or large language models for semantic judgment; and methods that comprehensively score multi-dimensional features.
[0003] Task-driven methods, such as the KGrEaT framework, reflect the impact of knowledge graphs on task results to some extent by replacing different knowledge graphs under fixed task settings and using task metrics to infer the graph support effect. However, this method essentially still indirectly derives the role of the graph from the overall task results, making it difficult to unify the changes in the state of the knowledge graph with the query support and answer behavior of the large language model in generative tasks.
[0004] Existing technologies generally suffer from the following shortcomings: it is difficult to establish a quantitative correlation between changes in the graph state and the supporting capabilities of the generation link; various evaluation results lack a unified and comparable expression method due to differences in units, calibers, and applicable scenarios; the ontology contribution of the graph is often mixed with external factors such as model structure, training process, and feature processing, and cannot be independently represented; in complex task scenarios involving time changes and object comparisons, the supporting capabilities of the evaluation method are not comprehensively characterized; the interpretability and problem localization capabilities of the evaluation results under abnormal conditions are insufficient, and the applicability and comparability of results under non-ideal data conditions are not supported. Summary of the Invention
[0005] This invention provides a large-scale model generation link control method based on knowledge graph quality assessment. The aim is to: establish a quantitative mapping relationship between changes in the quality state of the knowledge graph and the support effect of the large language model generation link, enabling graph visibility, query support status, and answer behavior to form a synergistic expression; construct a unified scoring framework so that evaluation results under different knowledge states, different question types, and different task scenarios can be compared and interpreted on the same scale; separate and highlight the actual contribution of the knowledge graph itself through independent characterization of symbol-side query support evaluation and neural-side answer behavior evaluation; improve the applicability of the evaluation for complex generation tasks (such as time-varying, object comparison, etc.); enhance the evaluation results' capabilities in anomaly localization, problem tracing, and auxiliary judgment; and ensure stable evaluation and feedback capabilities under non-ideal conditions such as data noise and knowledge limitations through a gated dual-path fusion scoring mechanism and link control module.
[0006] To achieve the above objectives, the present invention adopts the following technical solution: Large model generation link control methods based on knowledge graph quality assessment include: Obtain the knowledge graph and question set to be evaluated; For each question in the question set, a structured query statement for querying the knowledge graph is generated and executed to obtain query results, and a query hit flag is determined based on the query results; wherein, the query hit flag is used to indicate whether valid data is found from the knowledge graph; A symbolic score is calculated based at least on the query hit flag and the knowledge visibility index; wherein the symbolic score is used to characterize the query support capability. The question and the query results are input into a large language model to generate a natural language answer. The answer behavior features are extracted from the natural language answer, and the consistency between the answer behavior and the query result is determined based on the answer behavior features and the query hit flag; wherein, the answer behavior features include at least whether the natural language answer is characterized as a refusal to answer and whether it is characterized as an expression of uncertainty; Based on the answer behavior characteristics, the consistency status, and the query hit flag, a scenario-adaptive approach is used to calculate the neural side score; wherein, in different scenarios of query hit and query miss, the neural side score has opposite evaluation directions for deterministic answers, uncertain expressions, and refusal to answer. Based on the query hit flag, the symbol-side score and the neural-side score are fused using a gated dual-path system to obtain a comprehensive score. Based on the comprehensive score, control signals are generated to control the large language model generation link.
[0007] In this specification, the knowledge visibility index is determined based on a preset knowledge quality level; the calculation of the symbolic score is specifically as follows: the knowledge visibility index and the query hit flag are weighted and summed to obtain the symbolic score.
[0008] In this specification, the generation and execution of a structured query statement for querying the knowledge graph further includes: Access knowledge quality level configuration; Based on the current knowledge quality level to be evaluated, a knowledge visibility constraint is added to the structured query statement so that the query is executed only within the knowledge range that is allowed to be accessed at the current level, thereby obtaining query results and query hit flags corresponding to the knowledge quality level.
[0009] In this specification, the addition of knowledge visibility constraints to the structured query statement based on the current knowledge quality level to be evaluated includes: Identify target node variables in the structured query statement that are directly related to the natural language answer; Determine the entity type corresponding to the target node variable; Determine the corresponding knowledge visibility restrictions based on the current knowledge quality level; The knowledge visibility constraints are added to the query condition part or node attribute constraint part of the structured query statement.
[0010] In this specification, the extraction of answer behavior features from the natural language answer includes: The natural language answer is matched with a preset set of rejection templates to obtain a rejection matching score. When the rejection matching score is greater than or equal to a first threshold, it is determined that the natural language answer has a rejection behavior. The natural language answer is matched with a preset set of uncertainty templates to obtain an uncertainty matching score. When the uncertainty matching score is greater than or equal to a second threshold, it is determined that the natural language answer contains uncertain expressions.
[0011] In this specification, the determination of the consistency between the answer behavior and the query result based on the answer behavior characteristics and the query hit flag includes: When the query hit flag indicates a query hit, and the natural language answer contains neither rejection nor uncertainty, it is determined to be consistent; or, When the query hit flag indicates that the query was not hit, and the natural language answer contains rejection behavior or uncertain expression, it is determined to be consistent; Otherwise, it is judged as inconsistent.
[0012] In this specification, the calculation of neural side scores using a scene-adaptive approach includes: When the query hit flag indicates a query hit, a positive evaluation is given to a definitive answer, and a negative evaluation is given to an uncertain expression and a refusal to answer. When the query hit flag indicates that the query has not been hit, positive evaluation is given to uncertain expressions and refusal to answer, and negative evaluation is given to certain answers; Furthermore, the consistency status is used as a scoring weighting factor to constrain the neural-side scoring to be consistent with the query support scenario.
[0013] In this specification, the process of performing gated dual-path fusion of the symbol-side score and the neural-side score based on the query hit flag to obtain a comprehensive score includes: A gated output value is generated using a smooth gating function that takes the query hit flag as input. Using the gated output value as a dynamic weight, the symbolic side score and the neural side score are weighted and fused according to the first set of path weights and the second set of path weights to obtain the comprehensive score; wherein, the first set of path weights corresponds to the query supported scenarios, and the second set of path weights corresponds to the query unsupported scenarios.
[0014] In this specification, based on the comprehensive score, control signals are generated for controlling the large language model generation link, including: When the query hit flag indicates a query hit, if the overall score is higher than a preset release threshold, a release signal is generated; if the overall score is lower than a preset security threshold, a blocking signal or a review flag is generated. When the query hit flag indicates that the query has not been hit, no pass signal is generated; if the comprehensive score is not lower than the security threshold, a knowledge support insufficiency warning signal and a review flag are generated; if the comprehensive score is lower than the security threshold, a knowledge support insufficiency warning signal and a blocking signal are generated.
[0015] In this specification, when the neural side score or the symbol side score is lower than a preset abnormality threshold, a status record marker is generated and stored together with the query hit flag and the comprehensive score for anomaly localization and statistical analysis.
[0016] In summary, the present invention has at least the following beneficial effects: (1) A quantitative correlation was established between changes in the state of the knowledge graph and its ability to support generation tasks: Existing knowledge graph quality assessment methods typically focus on the graph ontology structure state, rule consistency, or specific task indicators, making it difficult to directly characterize the impact of changes in the state of the knowledge graph on the ability to support generation tasks of large language models. This invention establishes a quantitative correspondence between changes in the state of the knowledge graph and changes in its ability to support generation tasks by performing structured queries on the same question at different knowledge quality levels and mapping the query support state and answer behavior state to a unified comprehensive score. Related multi-level verification results are shown in Example 6 and... Figures 5 to 7 .
[0017] (2) A unified, comparable, and collaboratively interpretable expression of supporting capabilities has been formed: Existing task-driven knowledge graph evaluation results usually rely on specific task indicators, and structural states, rule states, and task performance from different sources are often scattered, making it difficult to form a unified, comparable, and collaboratively interpretable expression of supporting capabilities. This invention maps query supporting states, answer behavior states, and gated dual-path fusion results to the same comprehensive scoring scale, enabling results from different knowledge states, different problem instances, and different task scenarios to be compared on a unified scale, and facilitating collaborative understanding of evaluation information from the symbolic and neural sides. Relevant verification results are shown in Example 6.
[0018] (3) Enhanced independent representation of knowledge graph support contributions: Existing task-driven evaluation methods typically use the overall task results as an indirect reflection of the knowledge graph's support role, making it difficult to further distinguish the actual contribution of the knowledge graph ontology from factors such as model structure, training process, or behavioral biases. This invention extracts symbolic features from query execution results and neural features from natural language answers, enabling the assessment of whether the knowledge graph provides query support and whether the answer behavior is coordinated with the support state within the same evaluation framework. This more directly reflects the actual support role of the knowledge graph in the generation task. The comparative verification results of related components are shown in Example 6.
[0019] (4) Enhanced support and characterization capabilities in complex task scenarios: Most existing knowledge graph quality assessment methods focus on the graph ontology state or specific task results, and their characterization of knowledge graph support in complex task scenarios involving time changes, object comparisons, and multiple constraints remains relatively limited. This invention maps query states and answer behaviors under different quality levels and question types to the same comprehensive scoring scale, enabling consistent characterization of results in complex generation task scenarios such as time changes and object comparisons. Relevant question type verification results are shown in Example 6.
[0020] (5) Enhanced the ability to assist in judgment of assessment results: Existing assessment methods usually rely solely on internal indicators or single task outputs to form results, lacking reference bases that can be used to assist in understanding the trend of result changes and to apply judgment. To assist in explaining the trend of changes in the comprehensive scoring results, this implementation introduces B1 (level label reference item) and B2 (hit status reference item) as reference comparison items to supplement the explanation of the trend of changes in the comprehensive scoring results under representative knowledge quality levels. The relevant auxiliary reference comparison results and Table 1 are explained in Example 6.
[0021] (6) Enhanced interpretability and problem localization capabilities of evaluation results: Existing solutions often struggle to pinpoint the source of problems when evaluation results are low or abnormal, whether it stems from knowledge miss, abnormal answer behavior, or inconsistency between the two. Therefore, their support for anomaly analysis and problem localization remains limited. This invention uses the query hit flag H as a supporting scenario differentiation condition, triggering different evaluation paths in query hit and query miss scenarios. During the scoring process, intermediate features such as knowledge visibility, number of result records, query hit flag H, rejection flag R, and uncertainty flags U and C are retained. This allows for differentiation of different causes when the overall score is low, including knowledge miss, model rejection, model uncertainty, and inconsistency between answer behavior and query result status. This provides a basis for query chain correction, answer strategy adjustment, and the generation of result status markers as described in step 11. Related results are shown in Example 6.
[0022] (7) Improved applicability and comparability of results under non-ideal data conditions: Existing knowledge graph quality assessment methods often lack systematic support for comparability of results and applicability under non-ideal conditions such as limited knowledge state, noise in the data, or changes in parameter settings. This invention, through a unified scoring framework and a gated dual-path evaluation mechanism, enables the method to generate comparable evaluation results even under conditions of limited knowledge state and changes in parameter settings. Relevant results are shown in Example 6. Attached Figure Description
[0023] Figure 1 A flowchart illustrating the process of generating a link control method for a large model based on knowledge graph quality assessment.
[0024] Figure 2 This is a schematic diagram of the structure of the neurological scoring module involved in this invention.
[0025] Figure 3 This is a schematic diagram of the gated dual-path fusion scoring mechanism involved in this invention.
[0026] Figure 4 This is a schematic diagram of the link control and status registration rules involved in this invention.
[0027] Figure 5This is a schematic diagram illustrating how the hit rate and overall score change with the knowledge quality level in this invention.
[0028] Figure 6 This is a schematic diagram illustrating the multi-level monotonicity comparison involved in this invention.
[0029] Figure 7 This is a schematic diagram illustrating the changes in the comprehensive score under different knowledge quality levels involved in this invention. Detailed Implementation
[0030] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
[0031] Figure 1 This paper demonstrates the overall processing flow of the present invention, from question input, query generation and execution, two-sided scoring, gated dual-path fusion, to the generation of link control signals / result status markers. The nodes in the diagram belong to the query generation and execution module, the symbol-side feature extraction and scoring module, the neural-side feature extraction and scoring module, the gated dual-path fusion scoring module, and the scoring result feedback and link control module described in section 4.2.1 of the main text; wherein, the scoring result feedback and link control module receives the comprehensive score. Symbolic scoring Neurological score And query hit flags Based on this, it triggers link control actions such as answer release, prompt output, manual review, or blocking return, and outputs corresponding result status markers; all modules work together to serve the overall evaluation mechanism of "query support status identification -> two-sided evaluation -> gating path switching -> unified support capability score output -> link control signal / result status marker generation".
[0032] like Figure 1As shown, this embodiment provides a large model generation link control method based on knowledge graph quality assessment, including: acquiring a knowledge graph to be evaluated and a question set; for each question in the question set, generating and executing a structured query statement for querying the knowledge graph to obtain query results, and determining a query hit flag based on the query results; wherein the query hit flag is used to characterize whether valid data is retrieved from the knowledge graph; calculating a symbolic side score based at least on the query hit flag and a knowledge visibility index; wherein the symbolic side score is used to characterize query support capability; inputting the question and the query results into a large language model to generate a natural language answer; extracting answer behavior features from the natural language answer, and based on the answer... The consistency between the answer behavior and the query result is determined by the case behavior characteristics and the query hit flag; wherein, the answer behavior characteristics include at least whether the natural language answer is characterized as a refusal to answer and whether it is characterized as an uncertain expression; based on the answer behavior characteristics, the consistency status, and the query hit flag, a neural side score is calculated using a scenario-adaptive approach; wherein, in different scenarios of query hit and query miss, the neural side score has opposite evaluation directions for deterministic answers, uncertain expressions, and refusal to answer; based on the query hit flag, the symbol side score and the neural side score are fused using a gated dual-path method to obtain a comprehensive score; based on the comprehensive score, a control signal is generated for controlling the large language model generation link.
[0033] In some embodiments, the calculation of neural side scores using a scene-adaptive approach further includes: When the query hit flag indicates a query hit, and the natural language answer contains both uncertain expressions and certain content, the neural side score is negatively adjusted to a corresponding degree based on the semantic weight of the uncertain expressions in the natural language answer. When the query hit flag indicates that the query was not hit, and the natural language answer provides a specific inference while expressing uncertainty, the specific inference part is identified, and the corresponding deterministic answer behavior is negatively evaluated.
[0034] In some embodiments, the determination of the consistency status further includes: When the query hit flag indicates a query hit, and the natural language answer is a definite expression, but the content of the definite expression conflicts factually with the content of the query result, it is determined to be in an inconsistent state; The detection of factual conflicts is achieved by comparing the key factual assertions in the natural language answer with the corresponding fields in the query result content.
[0035] In some embodiments, the neural side score is calculated using a scenario-adaptive approach based on the answer behavior characteristics, the consistency status, and the query hit flag, and the completeness of the query results is further taken into account. Specifically, when the query hit flag indicates that the query has been hit but the number of records in the query result content is lower than a preset completeness threshold, a corresponding scoring adjustment is applied to the degree of certainty shown in the natural language answer, so that the neural side score can reflect the supporting characteristics of the knowledge graph under data sparsity conditions.
[0036] In some embodiments, before performing gated dual-path fusion on the symbol-side score and the neural-side score based on the query hit flag, the method further includes: The query hit flag is smoothed to generate a continuous gated output value; The gated output value takes continuous values between 0 and 1 to achieve a smooth transition between the first set of path weights corresponding to the knowledge-supported scenarios and the second set of path weights corresponding to the knowledge-unsupported scenarios, thus avoiding a sharp change in the comprehensive score due to the binarization jump of the query hit flag.
[0037] In some embodiments, the smoothing process is implemented using an S-shaped function with the query hit flag as input; the slope parameter of the S-shaped function is configurable and is used to adjust the degree of gating transition to adapt to the different requirements of scoring stability in different application scenarios.
[0038] In some embodiments, in the first set of path weights, the weight of the neural side score is greater than the weight of the symbolic side score; in the second set of path weights, the weight of the symbolic side score is greater than the weight of the neural side score. Therefore, in scenarios where the knowledge is already supported, the comprehensive score focuses more on evaluating the utilization and consistency of the retrieved knowledge by the large language model; in scenarios where the knowledge is not supported, the comprehensive score focuses more on evaluating the constraints of insufficient accessibility of the knowledge graph on the generation behavior.
[0039] In some embodiments, when generating the blocking signal or the verification mark, the method further includes: Based on the relative magnitudes of the symbol-side score and the neural-side score, abnormal attribution information is generated; Specifically, when the symbol-side score is lower than the neural-side score and lower than the first attribution threshold, the abnormal attribution information indicates that the problem originates from missing knowledge graph data; when the neural-side score is lower than the symbol-side score and lower than the second attribution threshold, the abnormal attribution information indicates that the problem originates from abnormal behavior generated by the large language model; when both are lower than their respective thresholds, the abnormal attribution information indicates a composite anomaly.
[0040] In some embodiments, the anomaly attribution information is output along with the review marker to present a preliminary judgment of the anomaly source in a visual manner on the manual review interface, assisting reviewers in quickly locating the root cause of the problem.
[0041] In some embodiments, the knowledge graph is stored in the form of a graph database, which includes structured data objects with physical or business attributes and their attribute fields; the structured data objects include at least spatial units, observation records and time points, and the attribute fields include at least ecological indicator fields associated with the spatial units.
[0042] In some embodiments, the question set includes at least one question type among single-value queries, time-varying queries, and comparison queries; The time change query involves the attribute changes of the same object at different points in time, and the comparison query involves the attribute comparison of different objects at the same point in time; the method can output a comparable comprehensive score for different question types.
[0043] The technical concept of this invention is as follows: This invention aims to propose a large-scale model generation link control method based on knowledge graph quality assessment, in order to better characterize the supporting role of knowledge graphs in generation tasks. The specific objectives of this invention are as follows: (1) Establishing a quantitative correlation between knowledge graph state changes and generation task support capabilities: This invention aims to enable the quality state changes of knowledge graphs to be quantitatively reflected in the generation task support capabilities, so as to no longer only focus on the local detection results of the graph ontology state, but to reflect the degree of influence of knowledge state changes on the support of the generation link.
[0044] (2) Establish a unified and comparable evaluation expression for support capabilities: The present invention aims to enable the evaluation results under different knowledge states, different problem instances and different task scenarios to be output under a unified scale, thereby forming a horizontally comparable evaluation result for support capabilities and improving the consistency and comparability of evaluation results between different scenarios.
[0045] (3) Enhance the independent representation ability of knowledge graph support contribution: The present invention aims to enable the evaluation results to more directly reflect the actual support role of knowledge graph in the generation task, reduce the interference of external factors such as downstream model structure, feature processing, data partitioning and training process on the judgment of knowledge graph contribution, thereby improving the ability to identify the role of knowledge graph itself.
[0046] (4) Enhance the support and characterization capabilities in complex task scenarios: The present invention aims to make the evaluation method not only applicable to simpler task scenarios, but also able to consistently characterize the support role of knowledge graphs in scenarios where query conditions are relatively complex, comparison objects or time conditions change, and knowledge organization relationships are relatively diverse.
[0047] (5) Enhance the auxiliary judgment ability of the evaluation results: The present invention aims to enable the evaluation results to provide auxiliary basis for understanding, application judgment and subsequent use, in addition to providing unified quantitative results, thereby improving the reference value and auxiliary judgment value of the evaluation results in practical applications.
[0048] (6) Enhance the interpretability and problem localization of evaluation results: This invention aims to enable the evaluation results to not only reflect the degree of support of knowledge graphs for large language model generation tasks, but also to provide a basis for problem source analysis, cause differentiation and targeted improvement when the evaluation results are low or abnormal, thereby improving the supporting role of evaluation results in anomaly explanation and problem localization.
[0049] (7) Improve the applicability and comparability of the evaluation method under non-ideal data conditions: The present invention aims to enable the evaluation method to form comparable evaluation results under non-ideal conditions such as incomplete knowledge, noise in the data, or changes in application conditions, thereby improving the applicability of the method in different usage environments.
[0050] Overall approach: such as Figure 1 As shown, this invention starts with query support status identification, determines the query support status of the current problem under the current knowledge quality level through structured query generation, query optimization and query execution; then symbolic evaluation and neural evaluation are formed respectively; then the query hit status drives gated dual-path fusion to output a unified support capability score, and further generates link control signals or result status markers.
[0051] This invention focuses on the support capability of underlying structured data content with clear physical or business attributes in knowledge graphs for the generation process: First, it identifies whether the current question, at the current knowledge quality level, receives query support from structured data content such as observation records, time points, and attribute fields; then, it forms symbolic-side support evaluation and neural-side answer behavior evaluation respectively; next, it performs gated dual-path switching based on query hit status under different knowledge support scenarios; finally, it outputs a unified support capability score. Through the above process, the accessibility status of underlying structured data content, query support status, and answer behavior status can be incorporated into the same evaluation framework, thereby forming a unified quality expression for large language model generation tasks. The overall process is as follows: Figure 1 As shown.
[0052] Module 1. Query Generation and Execution Module: Step 1. Input Preparation: Required conditions: The following are required: a knowledge graph to be evaluated, a question set, quality tier configurations, graph database connection parameters, and a large language model API. Specifically: The knowledge graph is stored in a graph database, containing structured data objects with clear physical or business attributes and their attribute fields, including grid cells, observation records, time points, and ecological indicator fields such as the Normalized Difference Vegetation Index (NDVI). The question set is stored in a structured file format, including at least a question identifier, question text, and a description of the query target related to the underlying business data. Quality tier configurations represent the evaluation status under different knowledge visibility conditions. Graph database connection parameters include database address, port, username, password, etc. The large language model API is used to generate structured query statements and the final natural language answers.
[0053] Required components: Programming languages and their related libraries can be used to complete flow control, file reading, database connection, result parsing, and model invocation. Neo4j can be used for graph databases, and Cypher can be used for structured query language.
[0054] Operations: Read the question set file and parse the question identifiers and question texts; load the quality level configuration; establish a connection with the graph database according to preset connection parameters; initialize the runtime environment required for subsequent query generation, query execution, answer generation, and score calculation.
[0055] In one implementation, the database connection can be completed through a graph database driver, and the model call interface can be initialized through a Hypertext Transfer Protocol (HTTP) request method or a Software Development Kit (SDK) interface method; the runtime environment includes at least a problem reading component, a database access component, a model call component, and a result parsing component.
[0056] Output: A list of issues available for further processing, a list of quality levels, a database connection object, and a model invocation environment.
[0057] Step 2. Query statement generation: Input: Question text and knowledge graph schema information. The schema information includes at least node type, relation type, and attribute field name.
[0058] Operation: For each natural language question, combining the question content and knowledge graph pattern information, generate a structured query statement to access the underlying structured data objects and their attribute fields. The structured query statement should be executable in the knowledge graph database and return structured data retrieval results relevant to the question.
[0059] In one implementation, the question text can first be parsed for entity, time, and target attributes, and then combined with knowledge graph pattern information to map it into corresponding structured data object node variables, relational paths, and return fields, thereby generating a structured query statement.
[0060] Output: The structured query statement corresponding to each question.
[0061] Step 3. Query statement optimization: Input: The structured query statement generated in step 2 and the current quality level.
[0062] Operation: When the current quality level is not a complete knowledge level, the structured query statement is optimized for quality level constraints. The optimization includes: (1) identifying the target node variable in the query statement; (2) determining the entity type corresponding to the target node variable; (3) according to the current quality level, concatenating the knowledge visibility constraint to the query condition part or the node attribute constraint part; (4) generating an optimized query statement that can be executed under the current quality level.
[0063] The identification of the target node variables can be accomplished through string parsing, pattern matching, rule matching, or other programmatic methods. The knowledge visibility constraints are used to control whether queries access knowledge content visible at the current quality level.
[0064] In one implementation, variable identification can be performed on the MATCH fragment of the original query statement to locate the target node variable directly related to the answer, and then the visibility constraint corresponding to the current quality level can be added to the WHERE clause or node attribute constraint part.
[0065] Output: The optimized query statement corresponding to the current quality level.
[0066] Step 4. Query execution: Input: The optimized query statement obtained in step 3.
[0067] Operation: Execute the optimized query statement on the knowledge graph database and record: (1) query execution status; (2) number of returned records; (3) query result content. When the query execution fails, record the failure status and error information; when the query execution succeeds, save the returned record set for subsequent symbol-side feature extraction and answer generation steps.
[0068] Output: The query execution status, the number of result records, and the structured query results.
[0069] Module 2. Symbolic Feature Extraction and Scoring Module: Step 5. Symbol-side feature extraction: Input: The query execution status, number of result records, and current quality level obtained in step 4.
[0070] Operation: Extract symbolic features from the query execution results. The symbolic features are used to characterize the accessibility status and query support of the underlying structured data content with clear physical or business attributes under the current knowledge quality level, and to reflect the coupling state of "current problem - current level - current structured query results". The symbolic features include at least: (1) knowledge visibility index to characterize the degree of knowledge visibility under the current quality level; (2) query hit flag to characterize whether the query returns valid records; (3) result record count to characterize the number of records returned by the current query. Wherein: when the result record count is greater than 0, the query hit flag is 1; when the result record count is equal to 0, the query hit flag is 0.
[0071] This invention preserves both knowledge visibility and query hit status in the symbolic side features, thus characterizing the accessibility of the underlying structured data content at the current level, and whether the current problem actually receives query support provided by this type of data object. Furthermore, the symbolic side evaluation characterizes the instantiation support capability under the coupled state of "current problem—current level—current structured query result" through knowledge visibility, query hit status, and result recording.
[0072] Output: Obtain the symbolic feature set. Output example: The symbolic feature set can be represented as {V,H,N}, where V represents the visibility of knowledge under the current knowledge quality level; V can be directly given by the current quality level configuration, or calculated from the proportion of visible knowledge to complete knowledge at the current level; in this embodiment, V can take preset level values such as {100,75,50,35}; H represents whether the current query is hit in this graph state, i.e., the query hit flag; Indicates the number of result records.
[0073] Step 6. Calculation of symbol-based scores: Input: The set of symbolic side features obtained in step 5.
[0074] Operation: Calculate the symbolic score based on the knowledge visibility metric and query hit flag. Symbolic Score The calculation formula is shown in formula (1): (1) Where: α is the knowledge visibility weight parameter, used to characterize the knowledge visibility index. The contribution ratio in the symbol-side score; β is the query hit weight parameter, used to characterize the query hit indicator. The contribution ratio in the symbolic scoring. α and β are preset non-negative weight parameters that can be configured according to the application scenario, problem type, or evaluation task requirements.
[0075] N is retained as an intermediate feature, mainly used for auxiliary analysis, anomaly localization, and result interpretation, and is not used as the core main formula variable for comprehensive scoring.
[0076] The weight parameters satisfy the non-negativity constraint; in a preferred setting, the sum of weights in the same group is 1. The weight parameters can be determined empirically or by parameter tuning using a verification set, and can be optimized according to specific application scenarios.
[0077] Output: Obtain the symbolic score. .
[0078] Module 3. Neural Lateral Feature Extraction and Scoring Module: Step 7. Answer Generation: Input: User question and the structured query results obtained in step 4.
[0079] Operation: Input the user's question and query results into the large language model to generate a corresponding natural language answer. This answer serves as the analysis object for subsequent neural side feature extraction.
[0080] Output: The natural language answer to each question.
[0081] Step 8. Neural side feature extraction: Input: The natural language answer generated in step 7 and the query result status obtained in step 4.
[0082] Operation: such as Figure 2 As shown, neural side features are extracted from the natural language answers generated by the large language model. These neural side features characterize the response behavior of the large language model under the current query result state and depict the coordination relationship between this behavior and the knowledge support state. The neural side evaluation revolves around the coordination relationship between the response behavior and the current query support state. Figure 2 The demonstration showcases the input, answer behavior analysis, uncertainty marker extraction, rejection marker extraction, answer-result consistency determination, and neural score calculation process of the neural evaluation module. It also highlights its evaluation logic for answer behavior patterns and their coordination with knowledge support status, forming a scenario-based evaluation process oriented towards answer behavior.
[0083] Let the natural language answer text be... Let the set of rejection templates be... ; Let the set of uncertainty templates be... ; Define a string matching function Used to represent answer text With template The degree of matching. This degree of matching can be calculated using keyword hit rate, normalized edit distance similarity, Jaro-Winkler similarity, or other string matching methods.
[0084] Furthermore, the refusal to answer match score The calculation formula is shown in formula (2), and the uncertainty matching score is... The calculation formula is shown in formula (3): (2) (3) For elements in the set of rejection templates ; For elements in the uncertain template set .
[0085] (1) Refusal to answer The rejection flag is used to detect whether the large language model explicitly refuses to answer, and its determination formula is shown in formula (4): (4) (2) Uncertainty indicators The uncertainty flag is used to detect whether the answer contains uncertain expressions. Its determination formula is shown in formula (5): (5) in: This is a sign of refusal to answer; This is a sign of uncertainty; This is the threshold for determining refusal to answer; This is the threshold for determining uncertainty.
[0086] In one implementation, the result can be determined by the string matching score and threshold; in other implementations, it can also be determined by rule template matching or classifier output.
[0087] (3) Consistency markers of answer and result Used to detect whether the behavior of the answer is consistent with the status of the query result. The determination formula is shown in formula (6): (6) The above formal expression strictly corresponds to the original judgment semantics, specifically including: 1. When the query is hit and the answer is a deterministic expression, Take 1. 2. When the query does not find a match and the answer is an uncertain expression or a refusal to answer, Take 1.3. When the query is successful but the answer is an uncertain expression or a rejection expression, Take 0.4. When the query does not find a match but the answer is still a definite expression, Take 0.
[0088] Output: The set of neural side features is obtained.
[0089] Step 9. Calculation of neurological score: Input: The set of neural side features obtained in step 8.
[0090] Operation: Based on the uncertainty flag U, the rejection flag R, and the answer-result consistency flag. The neural side score is calculated. The neural side score is used to quantify the degree of coordination between the answer behavior and the current knowledge support state, with the evaluation focusing on the matching relationship between the answer behavior and the knowledge support state. To avoid using the same scoring direction for rejection behavior and uncertainty expression in query hit scenarios and query miss scenarios, the neural side score adopts a scenario-adaptive calculation method; the neural side score calculation formula is shown in formula (7): (7) in, Weight parameters for expressing uncertainty behavior are used to characterize uncertainty indicators. The degree of impact on neural side scores under different query support scenarios; The weight parameter for the refusal behavior is used to characterize the refusal flag. The degree of impact on neural side scores under different query support scenarios; The consistency state weight parameter is used to characterize the relationship between the answer behavior and the query results. The constraint effect on neurological lateral scores. , and All of these are preset non-negative weight parameters, which can be configured according to different application scenarios.
[0091] The weight parameters satisfy the non-negativity constraint; in a preferred configuration... ; The scene adaptive calculation method enables: when When the neurological score is positive, it gives a positive evaluation to definitive answers; when At that time, the neurological assessment score gave positive evaluations for refusal to answer and expression of uncertainty; As a primary criterion, it is used to constrain whether the aforementioned behavior is consistent with the query support status.
[0092] Output: Obtain the neural side score. .
[0093] Module 4. Gated Dual-Path Fusion Scoring Module: Step 10. Calculation of Fusion Score: Input: The symbolic score obtained in step 6 The neural lateral score obtained in step 9 And query hit flags .
[0094] Operation: such as Figure 3 As shown, based on the query hit status, a gated dual-path fusion of the symbolic side score and the neural side score is performed to obtain a comprehensive score. Query hit flag. This mechanism distinguishes between two evaluation scenarios: "knowledge-supported" and "knowledge-unsupported," triggering different evaluation paths accordingly. In the query-hit scenario, the evaluation focus shifts to the answer's utilization and consistency with the retrieved knowledge; in the query-miss scenario, the evaluation focus shifts to insufficient graph accessibility and its constraints on generation behavior. Therefore, the gating in this step is used to trigger corresponding evaluation paths under different support scenarios, ensuring that the evaluation results reflect the semantic differences in evaluation under different support scenarios. Figure 3 The paper demonstrates the processing logic of switching between symbolic evaluation and neural evaluation under the gating condition of query hit flag H in two states: query hit and query miss. The paper also shows how a unified support capability score is generated according to the corresponding path weights.
[0095] The gating function formula is shown in formula (8): (8) in The Sigmoid activation function is used to constrain the gated output value G within the continuous interval [0,1] to avoid scoring cliffs caused by binarization jumps. The comprehensive scoring formula is shown in formula (9): (9) Output of the gated function; This is the gating slope adjustment parameter, used to control the smoothness of the gating transition. The default value is 10, and it can be adaptively adjusted within the range of [5,20] according to the stability requirements of the application scenario. w1 is the gating threshold; w1 is the symbolic score for the query's supported scenarios. The path weight, w2 is the neural side score in the query supported scenarios. Path weight; w3 is the symbol-side score in scenarios where the query is not supported. The path weight, w4 is the neural side score in the scenario where the query is not supported. The path weights are defined as follows: w1, w2, w3, and w4 are preset non-negative path weight parameters, which can be configured according to different generation task scenarios or risk control requirements. The weight parameters satisfy the non-negativity constraint; in a preferred setting: w1+w2=1, w3+w4=1; different query states correspond to different evaluation paths, and each path uses a weight set adapted to the evaluation scenario; the weight set can be preset, determined by validation set parameter tuning, and optimized in combination with specific application scenarios. The formula is used to characterize the implementation form of the evaluation path under different support scenarios, and to uniformly map knowledge visibility, query support status, and answer behavior coordination into a support capability score. When the score approaches 1, the corresponding knowledge already supports the scenario, and the evaluation focus shifts to the answer's utilization and consistency with the retrieved knowledge. When When the value approaches 0, it corresponds to a scenario where the knowledge does not support it. At this point, the evaluation focus shifts to the insufficient reachability of the graph and its constraints on the generation behavior. In actual implementation, when... When the variable is binary, the above gating function can be regarded as a smoothed expression of the original two types of scene switching mechanisms. The smoothed gating expression is used to enhance the formalization of the formula and maintain the distinction of evaluation semantics for different supporting scenarios.
[0096] Through the above gating switching and dual-path evaluation processing, the comprehensive score can adapt to the differences in how knowledge graphs support large language model generation tasks under different query states, and unify knowledge visibility, query support status and answer behavior coordination into the same support capability expression.
[0097] The comprehensive score output by this method unifies knowledge visibility, query support status, and answer behavior coordination into a score of the support capability of the current question under the current knowledge status. This result can be further statistically analyzed according to knowledge quality level, question type, and parameter configuration to verify its unified expression capability for different knowledge support scenarios and changes in knowledge quality.
[0098] Output: Receive the overall score The comprehensive score is a unified quantitative expression of the knowledge graph's ability to support large language model generation tasks.
[0099] In some embodiments, before performing gated dual-path fusion of the symbol-side score and the neural-side score based on the query hit flag, a multi-dimensional fusion processing of the gated input features is further included to generate a continuous gated output value adapted to the knowledge graph quality state, solving the problem of the original scheme's weak single-input gated logic and disconnection from the knowledge quality state; the gate function formula, parameter definition, and calculation rules of the multi-dimensional fusion processing are as follows: ; This represents the maximum knowledge visibility corresponding to a complete knowledge level (fixed value of 100). Used to normalize the visibility of knowledge to the [0,1] interval; The preset result completeness threshold (default value is 3). Used to characterize the completeness of query results, normalized to the [0,1] interval; The weight coefficients for query hit status, knowledge visibility, result completeness, and consistency status are respectively, satisfying the non-negativity constraint and =1, default value is =0.5、 =0.2、 =0.15、 =0.15; The example value is 0.5; The global gating offset threshold is fixed at 0.5. It is used to balance the overall distribution of multi-dimensional inputs, ensuring that the gating output approaches 1 in the query hit scenario and approaches 0 in the query miss scenario, while achieving a smooth transition under different knowledge quality levels.
[0100] Calculation Example: Using the 75% hit rate scenario from Example 2, given H=1, V=75, N=1, C=1, substitute the default parameters to calculate: G= (3.25)= ; In a scenario where the gear shift misses by 35%, given H=0, V=35, N=0, and C=1, substituting the default parameters, the calculation yields... This enables a clear distinction between hit / miss scenarios.
[0101] In some embodiments, the symbolic side score and the neural side score are weighted and fused according to a first set of path weights and a second set of path weights, using the gated output value as dynamic weights. Specifically, this involves adaptively generating dynamic weights based on the consistency between the knowledge graph quality status and the answer behavior, replacing fixed preset weights, and enhancing the creativity and scenario adaptability of the gating mechanism. The generation rules, fusion formulas, and parameter definitions of the dynamic weights are as follows: ; The dynamic generation formulas for the first set of path weights (knowledge-supported scenarios, corresponding to G approaching 1) w1 and w2 are as follows: ; The dynamic generation formulas for the second set of path weights (in scenarios where knowledge is not supported, corresponding to G approaching 0) w3 and w4 are as follows: ; This serves as the base weight for symbolic scoring in knowledge-supported scenarios, with a default value of 0.4, satisfying... ; This serves as the base weight for symbolic scoring in scenarios where knowledge is not supported; the default value is 0.6, which satisfies... ; Weight constraints: All dynamic weights satisfy the non-negativity constraint, and the sum of the weights is always 1 in the same scenario, i.e., w1+w2=1, w3+w4=1; Core adaptive logic: 1. In scenarios where knowledge is already supported, when the answer and result are completely consistent (C=1), w1=0 and w2=1. The overall score is entirely dominated by the neural network-based score, focusing on evaluating the large model's compliant use and consistency of the retrieved knowledge. When the answer and result are inconsistent (C=0), w1= w2=1- It automatically increases the weight of symbol-based scoring, weakens the interference of abnormal answer behavior on the overall score, and achieves adaptive correction for abnormal scenarios. 2. In scenarios where knowledge is not supported, the lower the knowledge visibility V, the larger w3, and the more the comprehensive score focuses on evaluating the knowledge support capability on the symbol side. This accurately depicts the degree of constraint on the generation link caused by insufficient accessibility of the knowledge graph, and is deeply bound to the core invention purpose of the patent.
[0102] Calculation Example: Using the parameters and results from Examples 3-5, 75% of the gears hit the scene, known... =90、 =100, C=1, The calculations show w1=0 and w2=1; 35% gear miss scenario, known. =14、 =85, V=35, The calculated values are w3 = 0.6 + 0.65 × 0.4 = 0.86 and w4 = 0.14; the final comprehensive score is calculated as follows: ; It achieves dynamic adaptation of weights.
[0103] In some embodiments, the gated dual-path fusion process further includes a gating correction mechanism for abnormal scenarios. For core abnormal scenarios where the query hit status and answer behavior do not match, the gating output value and comprehensive score are corrected in a targeted manner to enhance the ability to identify and constrain abnormal behaviors in the generation link. The abnormal scenario correction rules, calculation formulas, and parameter definitions are as follows: 1. Abnormal Scenario Judgment Rules: A scenario is judged as abnormal if any of the following conditions are met, and the logic for determining consistency with the original solution is fully compatible: Scenario A: Query hit (H=1) and answer-result inconsistency (C=0), that is, the knowledge is supported but the large model refuses to answer, expresses uncertainty, or has factual conflicts; Scenario B: Query miss (H=0) and answer-result mismatch (C=0), meaning the knowledge is not supported but the large model still outputs a deterministic answer; 2. Gated output correction formula: ; The corrected gated output value is defined by the boundary constraint function `clamp()`, which ensures that the output value is fixed within the range [0,1]. For the gated orientation offset, in scenario A =-0.3 (weakening the weight of already supported scenarios), in scenario B =0.3 (Strengthening the weight of unsupported scenarios); To correct the intensity coefficient, which is positively correlated with the degree of consistency deviation, when there is a factual conflict... =1, only when there is inconsistency in expression. =0.5; 3. Comprehensive scoring correction formula: using the corrected gated output value. The original gated output value G is replaced by a weighted fusion method, and a scoring penalty is added for abnormal scenarios. The revised comprehensive scoring formula is as follows: ; in, This is the penalty coefficient, with a value range of [0.2, 0.5] and a default value of 0.3. It is used to quantify the impact of abnormal behavior on the comprehensive score, ensuring that the scoring results in abnormal scenarios can accurately reflect the true level of the knowledge graph's support capabilities and avoid interference from abnormal generation behavior on the quality assessment results.
[0104] Calculation Example: Scenario A anomaly example, given H=1, C=0, original G=0.963, original... =99.05, There is a factual conflict ( =1), substitute into the calculation: =clamp(0.963-0.3×1,0,1)=0.663; =99.05×(1-0.3×)=69.34; This achieves accurate identification and scoring correction of abnormal scenarios, enhancing the practical value of the gating mechanism.
[0105] In some embodiments, the gated output value and comprehensive score obtained by the gated dual-path fusion are also used to reverse optimize the structured query statement and knowledge visibility constraints, and form a fully closed-loop linkage with the generation link control module, solving the problem of the original solution's gated module being disconnected from upstream and downstream processes; the specific implementation method is as follows: 1. Structured query reverse optimization process: When the gated output value G is in the fuzzy transition range of [0.2, 0.8], the adaptive optimization process of the structured query is triggered. The specific steps are as follows: Step a: Identify the limiting nodes of knowledge visibility constraints in the current query statement; Step b: Based on the current knowledge quality level, relax the first-level knowledge visibility constraint (e.g., relax the 35% level to the 50% level), regenerate the optimized structured query statement and execute it; Step c: Update the query hit flag H and the symbol-side score based on the new query results. The gated output value G is adjusted until G leaves the fuzzy transition range or reaches the complete knowledge level, thereby achieving dynamic adaptive calibration of the knowledge graph quality assessment. 2. Generate gating and linkage rules for link control: The access threshold, safety threshold, and control logic of the link control module are adaptively adjusted based on the gate output value G. The specific rules are as follows: When G≥0.8 (strong knowledge-supported scenario): the approval threshold is lowered by 5 percentage points, the security threshold is lowered by 3 percentage points, and priority is given to ensuring the efficient approval of answers supported by compliant knowledge; When G≤0.2 (strong knowledge unsupported scenario): disable the passage channel, increase the security threshold by 10 percentage points, and strengthen the blocking constraints on answers without knowledge support; When 0.2 < G < 0.8 (fuzzy transition scenario): Force the trigger of the review mark. Regardless of whether the comprehensive score reaches the release threshold, no release signal will be generated, ensuring that answers with unclear knowledge support status enter the manual review process, and achieving a deep binding of the entire process of the gating mechanism and link control.
[0106] In some embodiments, the weight parameter, slope parameter, and path weight of the gating function are adaptively adjusted based on the question type characteristics of the question set to improve the accuracy and applicability of quality assessment in different task scenarios, which is exactly matched with the verification logic for different question types; the question types include single-value query, time-varying query, and comparison query, and the corresponding adaptive adjustment rules are as follows: 1. Parametric definition of question type characteristics: Preset a dedicated set of gating parameters for each question type, and the parameter set includes the gating weight coefficient ; 2. Dedicated adaptation rules for different question types: Single-value query question type: The target is the attribute query of a single entity, and the completeness of the query result has a relatively low impact on the evaluation result. The preset parameters are = 0.05, = 0.25, = 15, strengthening the evaluation weight of the consistency state and improving the sensitivity of gating switching; Time-varying query question type: Involves multi-record comparison of multiple time points, and a high degree of completeness of the query result is required. The preset parameters are = 0.25, = 0.15, = 8, strengthening the evaluation weight of the result completeness and slowing down the gating transition speed to avoid score jumps caused by the absence of a single record; Comparison query question type: Involves multi-dimensional comparison of multiple entities, and high requirements for knowledge visibility and consistency. The preset parameters are = 0.3, = 0.2, = 10, strengthening the evaluation weights of knowledge coverage and answer consistency to ensure the stability of the evaluation result in the cross-entity comparison scenario; 3. Adaptive matching process: When generating a structured query statement, identify the question type label of the current question, automatically match the corresponding set of gating parameters for the question type, and complete the adaptive configuration of the gating function and path weight without manual intervention, achieving automatic adaptation in the entire scenario.
[0107] Module 5. Score result feedback and link control module: Step 11. Link control based on the comprehensive score: Input: The comprehensive score obtained in Step 10 and the symbol-side score obtained in Step 6 The neural lateral score obtained in step 9 And query hit flags .
[0108] Operation: such as Figure 4 As shown, link control signals or result status markers are generated based on the comprehensive score and its constituent results. These signals are used to participate in the output status management, risk warning, or review processing of the current natural language answer. In this invention, the standard output types of module 5 uniformly include: release signal, insufficient knowledge support warning signal, review marker, blocking signal, and status record marker; unless otherwise specified, the aforementioned other warning, review, or status descriptions all belong to the corresponding categories in the above standard output types. This step does not change the aforementioned score generation logic, but further transforms the score results into link control criteria that can be executed by the system. Specifically, when If the overall score is higher than the preset release threshold, a release signal is generated to trigger the current answer to return directly; if the overall score is within the preset prompt threshold range, a review mark or status record mark is generated to trigger review processing or status registration; if the overall score is lower than the preset safety threshold, a blocking signal or review mark is generated to trigger the current answer to be blocked from returning or enter review processing. If the overall score is not lower than the preset safety threshold, a knowledge support deficiency warning signal and a review mark are generated to trigger a prompt output and initiate review processing; if the overall score is lower than the preset safety threshold, a knowledge support deficiency warning signal and a blocking signal are generated to trigger a prompt output and prevent the current answer from returning directly. Furthermore, when... Low or When the level is too low, a status record marker can be generated and stored as a status marker, which can be used for problem localization and statistical analysis later. In this invention, both the review path and the blocking path can be combined with the insufficient knowledge support prompt signal to form a link control combination path in the case of a miss. Figure 4 The rules branches of the scoring result feedback and link control module are shown under different query hit statuses and comprehensive score ranges. Figure 4 China-Israel overall score Symbolic scoring Neurological score Query hit flags and release threshold Safety threshold As input for decision; when At that time, the system will enter the release, review, or blocking path based on the comprehensive score. When this happens, the system does not enter the release channel, but instead, based on the comprehensive score, enters either the prompt-for-review path or the prompt-for-block path; when Low or If the score is too low, a status record marker can be generated and stored as a status marker for later use in problem localization and statistical analysis. When the query hit flag indicates a query hit, if the overall score is higher than the preset release threshold... If the overall score is lower than the preset safety threshold, a release signal is generated; If this occurs, a blocking signal or a verification marker will be generated. Among these, the allowance threshold... Used to determine whether the current generated link meets the direct output conditions; security threshold. This is used to determine whether the currently generated link has a risk status that requires blocking or review. In one exemplary setting, the allowance threshold... The safety threshold can be set to 80 points. The score can be set to 40. The above threshold is only used to illustrate the judgment logic of this embodiment and does not constitute a limitation on the range of threshold values.
[0109] Output: Receive link control signals or result status flags.
[0110] Example 1: This example uses a remote sensing ecological knowledge graph question-and-answer scenario to illustrate the system composition and data flow relationships between modules of the method described in this invention. The knowledge graph is organized from underlying structured data objects with clear physical or business attributes, including at least grid cells, observation records, and time points. The observation records may contain ecological indicator fields such as NDVI. The question set covers single-value queries, time-varying questions, and comparison questions, used to represent generation task scenarios under different time conditions, different comparison objects, and different query conditions.
[0111] In this implementation scenario, the knowledge graph organizes structured observation data in the remote sensing ecological monitoring scenario using a grid cell-observation record-time point association method. The observation record contains at least ecological indicator fields such as NDVI. The quality level is used to represent the evaluation status under different knowledge visibility conditions. In the current implementation, four knowledge quality levels are used: 100%, 75%, 50%, and 35%. The question set consists of question instances constructed around grid cells, time conditions, and ecological indicators, which are used to verify the response of the unified support capability score described in this invention under different knowledge states.
[0112] In this embodiment, the knowledge graph corresponds to the structured observation data organization results in the remote sensing ecological monitoring scenario, and establishes a relationship around spatial grid units, annual observation records and time points; the quality level is formed by setting different visibility ranges for complete knowledge content, and is used to characterize the query support conditions under different knowledge availability states; the implementation object is a set of question instances constructed around grid units, time conditions and ecological indicators, which is used to observe the changes in query support status, answer behavior status and comprehensive scoring results under different knowledge states.
[0113] In one implementation, the quality tiers can be configured by setting a set of visible tiers for nodes or knowledge units directly related to the query results; when executing queries under different tiers, only knowledge content within the visible range corresponding to the current tier is allowed to be accessed, thereby forming a hierarchical evaluation environment between complete knowledge states and restricted knowledge states.
[0114] 1. System Composition: In this embodiment, the following modules are divided into implementation layer refinement modules corresponding to the above modules 1-5. The system includes at least the following modules corresponding to this scenario: (1) Scenario data input module, used to receive remote sensing ecological knowledge graph, question set around grid unit and annual NDVI, and quality level configuration; (2) Query generation module, used to generate corresponding Cypher query statements based on question text and graph pattern information; (3) Query statement optimization module, used to add quality level visibility constraints to target variables such as grid unit and observation record; (4) Query execution module, used to execute the optimized query statement in the graph database and output the query results; (5) Symbol side evaluation module, used to form instantiated query support based on query execution status, number of records and knowledge visibility. The system includes: (6) an answer generation module, which inputs user questions and query results into a large language model to generate natural language answers for the current question; (7) a neural side evaluation module, which forms answer behavior evaluation based on the rejection, uncertainty and consistency features in the answer; (8) a gated dual-path fusion scoring module, which switches the symbolic side evaluation and neural side evaluation according to the query hit flag H and outputs a comprehensive score; (9) a scoring result feedback and link control module, which generates release signals, knowledge support deficiency prompt signals, review flags, blocking signals or status record flags based on the comprehensive score, symbolic side score, neural side score and query hit flag; and (10) a result output module, which outputs question-level scoring results, link control results and grade-level statistical results. The above modules together serve the overall mechanism of query support status identification, dual-side evaluation, gated path switching, unified support capability score output and link control result generation.
[0115] 2. Data Flow Relationships Between Modules: In this scenario, the data input module first sends the question set, quality level configuration, and knowledge graph pattern information into the system; the query generation module generates the original Cypher query statement based on the question text, and the query statement optimization module adds visibility constraints to target variables such as grid cells and observation records according to the current level; the query execution module outputs the query status, number of records, and structured results, thereby first identifying whether the current question has obtained query support from the knowledge graph at the current level; then the query results are sent to the symbolic evaluation module to form a support evaluation, and together with the original question, they are sent to the answer generation module to form a natural language answer; the neural evaluation module then extracts behavioral features based on the answer and forms an answer behavior evaluation; the gated dual-path fusion scoring module combines the query hit flag H to perform scenario-based path switching on the two evaluations, obtaining a unified support capability score for the question at the current knowledge quality level; the scoring results are fed back to the link control module, which then generates corresponding link control signals or result status markers based on the comprehensive score, symbolic score, neural score, and query hit flag, and sends them to the result output module.
[0116] Example 2: Query generation, query statement optimization, and query execution; This example uses the question "What is the vegetation cover (NDVI) of grid G0378 in 2015?" as an example to illustrate the specific implementation process of "natural language question -> structured query generation -> query statement optimization -> query execution" in this scenario; Time-varying questions and comparison questions can have the time conditions or the number of comparison objects expanded on the same process.
[0117] Step 1. Question Input: Input: The question text is: "What is the vegetation cover (NDVI) of grid G0378 in 2015?".
[0118] Operation: Input the question as natural language and send it to the query generation stage.
[0119] Output: The natural language question to be converted. Example output: The natural language question to be converted is "What is the vegetation cover (NDVI) of grid G0378 in 2015?".
[0120] Step 2. Generating Structured Query Statements: Input: The natural language question from step 1 and the knowledge graph pattern information.
[0121] Operation: Generate the corresponding structured query statement based on the target entity, time condition, and target attribute in the question.
[0122] Example: The generated original Cypher query statement is as follows, where GID represents the grid number and NDVI represents the vegetation cover value: MATCH(g:GridCell{GID:'G0378'})-[:HASOBSERVATION]->(o)-[:INYEAR]->(t:TimePoint{year:2015})RETURN o.NDVI.
[0123] Output: The original structured query statement.
[0124] Output example: MATCH(g:GridCell{GID:'G0378'})-[:HASOBSERVATION]->(o)-[:INYEAR]->(t:TimePoint{year:2015})RETURN o.NDVI.
[0125] Step 3. Query statement optimization: Input: The original structured query statement obtained in step 2 and the current quality level.
[0126] Operation: Parse the MATCH fragment in the original query statement, identify the target node variables that are directly related to the answer, and attach quality level visibility constraints to them according to the entity type corresponding to the variable, so that the query only accesses the knowledge content that is visible at the current level.
[0127] In this embodiment, by parsing the original query statement, the target variables directly related to the answer can be identified, including: g corresponds to the grid cell node; o corresponds to the observation record node.
[0128] When the quality level is 75%, the corresponding knowledge visibility constraint is concatenated to the query condition to obtain the optimized Cypher query statement, where QL represents the set of visibility levels: MATCH(g:GridCell{GID:'G0378'})-[:HASOBSERVATION]->(o)-[:INYEAR]->(t:TimePoint{year:2015})WHERE'75%'IN g.QLAND'75%'IN o.QL RETURN o.NDVI.
[0129] When the quality level is 35%, the optimized Cypher query statement is: MATCH(g:GridCell{GID:'G0378'})-[:HASOBSERVATION]->(o)-[:INYEAR]->(t:TimePoint{year:2015})WHERE'35%'IN g.QL AND'35%'IN o.QL RETURN o.NDVI.
[0130] Output: The optimized query statement corresponding to the current quality level.
[0131] Output examples: The optimized query for the 75% tier includes the constraint '75%' IN g.QL AND '75%' IN o.QL; the optimized query for the 35% tier includes the constraint '35%' IN g.QL AND '35%' IN o.QL.
[0132] Step 4. Query execution: Input: The optimized query statement obtained in step 3.
[0133] Operation: Execute the optimized query statement in the knowledge graph database and record the query execution status, the number of records returned, and the content of the query results.
[0134] Example 1: After executing the optimized query statement for the 75% quality level, the returned result is: NDVI=0.7478965665111487; number of result records N=1; query hit flag H=1.
[0135] Example 2: After executing the optimized query statement at the 35% quality level, the returned result is: number of result records N=0; null flag E=1; missing category = A-class missing. This indicates that the problem was not found at the current quality level, and the reason for the miss is that the node was pruned due to quality issues.
[0136] Output: The query execution status, the number of result records, and the structured query results.
[0137] Output examples: At the 75% level, the output is "Query hit, number of result records is 1, return NDVI value"; at the 35% level, the output is "Query miss, number of result records is 0, result is empty and missing category is A-type filtered missing".
[0138] Example 3: Symbol-side feature extraction and symbol-side score calculation; This example continues to use the two sets of query results from Example 2, with a 75% hit rate and a 35% miss rate, to illustrate the symbol-side feature extraction method and the symbol-side score. The calculation method. This embodiment verifies whether the current problem has queryable support at the current gear level, and how the support status is instantiated and represented.
[0139] Step 1. Symbol-side feature extraction: Input: The query execution result obtained in Example 2, the number of result records, and the current quality level.
[0140] Operation: Extract symbolic features from the query execution results to characterize the instantiation support capability under the coupled state of the current problem, current quality level, and current query results. The symbolic features include at least: (1) knowledge visibility index; (2) query hit flag; (3) number of result records. Wherein: when the number of result records is greater than 0, the query hit flag is 1; when the number of result records is equal to 0, the query hit flag is 0.
[0141] Example 1: 75% quality level hit scenario. From the execution results of Example 2, we can obtain: knowledge visibility V=75; query hit flag H=1; number of result records N=1.
[0142] Example 2: Scenario where 35% quality level is not hit. From the execution results of Example 2, we can obtain: knowledge visibility V=35; query hit flag H=0; number of result records N=0.
[0143] Output: Obtain the symbolic feature set.
[0144] Output examples: The sign-side feature set at 75% can be represented as {V=75, H=1, N=1}; the sign-side feature set at 35% can be represented as {V=35, H=0, N=0}.
[0145] Step 2. Calculate the symbol-based score: Input: The set of symbolic side features obtained in step 1.
[0146] Operation: Calculate a symbolic score based on the knowledge visibility metric and query hit flag. This score quantifies whether the current question has queryable support at the current level. The calculation formula is: .
[0147] The weight parameters satisfy the non-negativity constraint; in a preferred setting, the sum of weights in the same group is 1. The weight parameters can be determined empirically or by parameter tuning using a verification set, and can be optimized according to specific application scenarios.
[0148] Under the feasible parameter configuration described in Example 6, α=0.4 and β=0.6.
[0149] Example 1: Substituting α=0.4, β=0.6, V=75, and H=1 into a scenario with a 75% quality hit rate, we get: Therefore, the symbol score for the 75% gear position is 90.
[0150] Example 2: Substituting α=0.4, β=0.6, V=35, and H=0 into the scenario where the 35% quality level misses, we get: Therefore, the symbol score at the 35% level is 14.
[0151] Output: Obtain the symbolic score. .
[0152] Output example: The sign score of the output at 75% gear is The sign score output at 35% gear is .
[0153] Example 4: Neural Feature Extraction and Neural Score Calculation; This example continues to use the query results from Example 2 to illustrate how to extract neural features from natural language answers and calculate neural scores in this scenario. This embodiment evaluates the response behavior of a large language model under a given query result state, and whether this behavior is consistent with the knowledge support state.
[0154] Step 1. Natural Language Answer Generation: Input: User question and structured query results obtained from Example 2.
[0155] Operation: Input the user's question and query results into the large language model to generate the corresponding natural language answer.
[0156] Example 1: 75% quality hit scenario. When the query is hit, the large language model generates the following answer: "The vegetation cover (NDVI) of grid G0378 in 2015 was 0.748".
[0157] Example 2: 35% quality level miss scenario When the query misses, the large language model generates the following answer: "Based on the current query results, it is not possible to determine the specific value of the vegetation cover (NDVI) of grid G0378 in 2015."
[0158] Output: The natural language answer to each question.
[0159] Output examples: At the 75% level, the output answer is "The vegetation cover (NDVI) of grid G0378 in 2015 is 0.748"; at the 35% level, the output answer is "Based on the current query results, the specific value of the vegetation cover (NDVI) of grid G0378 in 2015 cannot be determined at this time".
[0160] Step 2. Neural side feature extraction: Input: The natural language answer generated in step 1 and the status of the query results.
[0161] Operation: Extract neural side features from the natural language answers to characterize the response behavior of the large language model in the current query result state and its coordination with the knowledge support state. This step is used for contextual evaluation of answer behavior.
[0162] In this embodiment, let the current natural language answer text be A, and the set of rejection templates be... The set of uncertainty templates is and define a string matching function. Based on this, the refusal matching score can be obtained respectively. Matching scores with uncertainty : ; ; Refusal flag R: Its determination formula can be expressed as follows ; Uncertainty indicator U: Its determination formula can be expressed as follows ; Answer-result consistency flag C: It is determined based on the coordination relationship between query hit status and answer behavior, and the judgment formula can be expressed as: ; Example 1: 75% quality hit scenario. Based on the natural language answer in this scenario, we can extract: rejection flag R=0; uncertainty flag U=0; C=1; indicating that the answer was not rejected, there is no uncertainty expression, and it is consistent with the query result.
[0163] Example 2: 35% quality level miss scenario. Based on the natural language answer in this scenario, we can extract: rejection flag R=0; uncertainty flag U=1; C=1; indicating that the answer did not trigger an explicit rejection, but there is uncertainty expression, which is consistent with the empty result state.
[0164] Output: The set of neural side features is obtained.
[0165] Output examples: The neural side feature set at 75% level can be represented as {R=0, U=0, C=1}; the neural side feature set at 35% level can be represented as {R=0, U=1, C=1}.
[0166] Step 3. Calculate the neurological score: Operation: A neural side score is calculated based on a rejection flag, an uncertainty flag, and a consistency flag. This neural side score quantifies the degree of coordination between the answer behavior and the current knowledge support state. To ensure that the evaluation direction of answer behavior in query-hit and query-miss scenarios remains consistent with the actual supporting semantics, this embodiment employs a scenario-adaptive neural side score formula.
[0167] The calculation formula is: .
[0168] Under a feasible parameter configuration, take =0.15、 =0.15、 =0.7.
[0169] Example 1: Scenario with 75% quality setting applied =0.15、 =0.15、 =0.7, and H=1, R=0, U=0, C=1, therefore: .
[0170] Example 2: Scenario where the 35% quality setting is not hit =0.15、 =0.15、 =0.7, and H=0, R=0, U=1, C=1, therefore: .
[0171] Output: Obtain the neural side score. .
[0172] It should be noted that this example is used to illustrate a common type of uncertain response behavior in query miss scenarios, and does not limit the query miss scenario to manifest as an explicit refusal to answer; when the answer simultaneously manifests as an explicit refusal to answer and an expression of uncertainty, the neural side score can be further improved.
[0173] Output example: The neural lateral score output at 75% level is The neural score output at 35% level is: .
[0174] Example 5: Fusion Score Calculation; This example illustrates the comprehensive score calculation in this scenario. The calculation process is as follows. In this embodiment, the comprehensive score is formed using the query hit flag H as the scenario differentiation condition. The corresponding evaluation path is executed under different knowledge support scenarios, and a unified support capability score output is formed.
[0175] Step 1. Determine the fusion control parameters: Input: Query hit flag H.
[0176] Operation: The query hit flag H is used as a gating condition to distinguish between two evaluation scenarios: "the knowledge graph has provided retrieval support" and "the knowledge graph has not provided retrieval support," and accordingly triggers a switch in the evaluation semantics. To enhance the formalization of the formula, a smoothing gating function is further defined in this embodiment: ; Under the feasible parameter configuration described in Example 6, take =0.5, γ=10.
[0177] Output: The output G of the gating function. Output example: In a query hit scenario, G≈0.993; in a query miss scenario, G≈0.007.
[0178] Step 2. Calculate the overall score: Input: The symbolic score obtained from the preceding scoring steps Neural scores obtained from the preceding scoring steps And the output G of the gating function obtained in step 1.
[0179] Operation: Based on the query hit status, gated path switching and fusion are applied to the symbolic side score and the neural side score to obtain a comprehensive score. In the hit scenario, the path focuses on evaluating the utilization and consistency of the retrieved knowledge in the answer, while in the miss scenario, the path focuses on evaluating the insufficient accessibility of knowledge and its constraints on the answering behavior. Therefore, the gating in this step is used to trigger the corresponding evaluation path according to different support states and realize the corresponding evaluation semantic switching.
[0180] formula: ; In this embodiment, scenarios with supported knowledge and scenarios without supported knowledge correspond to their respective preset path weight sets, and the specific values are not limited.
[0181] This means that when a query is hit, a path that emphasizes answer utilization and consistency is adopted; when a query is not hit, a path that emphasizes insufficient knowledge accessibility and its behavioral constraint effect is adopted.
[0182] The symbolic side score and neural side score in this embodiment are calculated using the results of Examples 3 and 4 under the above-described feasible parameter configurations, respectively.
[0183] Example 1: The symbolic score for a hit scene can be obtained from the preceding scoring steps. and neurological score The values are 90 and 100 respectively, and the query hits. When γ = 0.5 and γ = 10: Substituting into the comprehensive scoring formula, it can be written as: Since G is close to 1, its evaluation focus is more on the path corresponding to the knowledge-supported scenario.
[0184] Example 2: For a missed scenario, the symbol-side score corresponding to the scenario can be obtained from the preceding scoring steps. and neurological score The values are 14 and 85 respectively, and the query did not find a match. When γ = 0.5 and γ = 10: Substituting into the comprehensive scoring formula, it can be written as: Since G is close to 0, its evaluation focus is more on the path corresponding to the scenario where knowledge is not supported.
[0185] Output: Receive the overall score This comprehensive score is the unified support capability score for the current problem at the current level.
[0186] Output example: In a hit scenario, the overall score is approximately obtained by fusing 90 and 100 according to the path weight of the hit scenario, and can be represented as follows: ≈w1×90+w2×100; In the case of a missed scenario, the overall score is approximately obtained by fusing 14 and 85 according to the path weight of the missed scenario, and can be expressed as ≈w3×14+w4×85.
[0187] Step 3. Generate link control signals: Input: The overall score obtained in step 2 The symbolic score obtained from the preceding scoring steps Neural scores obtained from the preceding scoring steps And query the hit flag H.
[0188] Operation: Generate link control signals or result status markers based on the comprehensive score and its components. This includes querying the hit flag H and the comprehensive score. Symbol-side scoring serves as a primary criterion for link control decisions. and neurological score It primarily serves as a basis for additional status recording and anomaly localization. In one example implementation, the allowance threshold can be set to 80 points, the prompt threshold range to 40 to 80 points, and the security threshold to 40 points. The allowance threshold, prompt threshold range, and security threshold are globally preset thresholds for the link control phase. When H=1, if the overall score is higher than the allowance threshold, the result is allowed to enter the allowance channel; if the overall score is within the prompt threshold range, a review flag is generated; if the overall score is lower than the security threshold, a blocking signal or review flag is generated. When H=0, the result does not enter the allowance channel; if the overall score is not lower than the security threshold, a knowledge support insufficiency prompt signal and a review flag are generated; if the overall score is lower than the security threshold, a knowledge support insufficiency prompt signal and a blocking signal are generated.
[0189] Furthermore, in this embodiment, the hit scenario and the miss scenario use the same set of implementable parameter configurations, namely, w1=0.4, w2=0.6, w3=0.8, and w4=0.2. The difference between the two types of scenarios is not that different parameter sets are used, but that the values of the query hit flag H are different, which leads to different gating function outputs G, and the comprehensive scoring results are dominated by the knowledge-supported path weight set (w1, w2) or the knowledge-unsupported path weight set (w3, w4), respectively.
[0190] For ease of explanation, the aforementioned comprehensive scoring formula is restated in this step as follows: ; Example 1: In this embodiment, under the same set of feasible parameter configurations, the hit scenario uniformly adopts w1=0.4, w2=0.6, w3=0.8, and w4=0.2. Since H=1 in this scenario, G≈0.993, and the comprehensive score is mainly dominated by the knowledge-supported path weight set (w1, w2). Furthermore, from step 2, we can obtain the following for this scenario: =90、 =100, then: Since the overall score is higher than the release threshold and the query hit flag H=1, it indicates that the current question has received effective query support at the current knowledge quality level, and that there is no obvious conflict between the current answer behavior and the knowledge support status. Therefore, a release signal is generated.
[0191] The release signal is used to instruct the subsequent result processing to perform the following operations: (1) allow the current natural language answer to enter the result output link; (2) mark the current result status as "knowledge support has been obtained and output is allowed"; (3) do not trigger the insufficient knowledge support prompt; (4) do not trigger the manual review process; (5) do not trigger the blocking process.
[0192] Therefore, in this hit scenario, the comprehensive score is not only used to characterize the knowledge graph's ability to support the current generation task, but also further participates in the result release control.
[0193] Example 2: In this embodiment, under the same set of feasible parameter configurations, w1=0.4, w2=0.6, w3=0.8, and w4=0.2 are uniformly used for the "missing" scenario. Since H=0 in this scenario, G≈0.007, and the overall score is mainly dominated by the weight set (w3, w4) of the knowledge-unsupported paths. Furthermore, from step 2, we can obtain the following for this scenario: =14、 =85, then: Since the overall score is below the safety threshold and the query hit flag H=0, both indicate that the current question has not been supported by effective query results at the current knowledge quality level. Therefore, a knowledge support insufficiency warning signal and a blocking signal are generated.
[0194] The knowledge support deficiency prompt signal is used to instruct the subsequent result processing to perform the following operations: (1) add a prompt mark "the current answer lacks sufficient knowledge support" to the result status; (2) generate corresponding support deficiency prompt information; (3) provide status basis for subsequent review or abnormal sample screening.
[0195] The blocking signal is used to instruct subsequent result processing to perform the following operations: (1) prevent the current natural language answer from being directly output as a normal result; (2) prohibit the answer from entering the default release return channel; (3) mark the current result status as "knowledge not supported and direct output prohibited"; (4) switch to prompt output or review processing flow when needed; (5) retain the query status, scoring result and answer text corresponding to the current sample and store them as status markers, which can be used for problem location and statistical analysis later.
[0196] Therefore, in this no-hit scenario, the comprehensive score not only reflects the weak support capability of the knowledge graph for the generation task, but also can further drive the system to block the direct return of results that do not have sufficient knowledge support.
[0197] Output: Receive link control signals or result status flags.
[0198] Under different scoring ranges, the system can generate one or more link control signals or result status markers. The link control signals or result status markers may include at least one or more of the following types: (1) a release signal, used to allow the result to be output directly; (2) a knowledge support deficiency prompt signal, used to prompt that the current result lacks sufficient knowledge support; (3) a review marker, used to trigger manual review or abnormal sample inspection process; (4) a blocking signal, used to prevent the current result from being output directly; and (5) a status record marker, used to store the current sample as a status marker, which can be used for problem location and statistical analysis later.
[0199] Output Example: In a hit scenario, the output is a pass signal; the system allows the current answer to enter the normal output path based on the pass signal, and marks the current result as "supported and passed". In a miss scenario, the output is a knowledge support insufficiency warning signal and a blocking signal; the system generates a support insufficiency warning message based on the knowledge support insufficiency warning signal, and prevents the current answer from being directly output based on the blocking signal, while marking the current result as "knowledge not supported and blocked".
[0200] It should be noted that even if the overall score may enter the prompt or review range under certain conditions in the case of a missed hit, as long as the hit flag H=0, the system will not enter the release channel, but will only divert traffic in two paths: "insufficient knowledge support prompt + review" and "insufficient knowledge support prompt + blocking".
[0201] It should be noted that the comprehensive score result in this step is a representative single sample example value, used to illustrate the calculation process, and does not correspond to the overall mean obtained by statistically analyzing all samples in Example 6.
[0202] Therefore, the comprehensive scoring results are not only used to form a unified support capability evaluation result, but also serve as a trigger basis for subsequent result output control, enabling the scoring results to directly participate in answer release, prompting, review or blocking processing.
[0203] Example 6: Verification Example Description; This example does not introduce a new scoring mechanism, but rather, based on the aforementioned method steps, provides a verification application description of the scoring results under different knowledge quality levels, question types, and parameter conditions. This example is built upon the problem input, query execution, two-sided evaluation, and fusion scoring process described in Examples 2 to 5. It mainly undertakes the verification content directly related to the existing main experimental results, including multi-level verification, question type verification, component comparison verification, parameter configuration verification, and auxiliary reference comparison description. Among them, the link control implementation example corresponding to step 11 has been inherited and described by step 3 of Example 5; the parameter public completion and the underlying structured data object representation are also inherited and described by the aforementioned method steps and the existing verification results in this example. Therefore, this example does not add new experimental illustrations separately.
[0204] (1) Validation Objects and Parameter Configuration: The validation objects include four knowledge quality levels (100%, 75%, 50%, and 35%), single-value queries, time-varying question types, and comparison question types, as well as different weight configurations in the symbolic, neural, and gated dual-path fusion. In one feasible parameter configuration, the symbolic side weights are α=0.4 and β=0.6; the neural side weights are... =0.15、 =0.15、 =0.7; Gating parameters adopted =0.5, γ=10; where, the rejection threshold is... A value of 0.6 can be used as the uncertainty threshold. The path weights can be set to 0.5; for scenarios where knowledge is supported, w1=0.4 and w2=0.6 can be used, while for scenarios where knowledge is not supported, w3=0.8 and w4=0.2 can be used. The above parameter configuration is used to illustrate one implementable way of writing the neural side scoring formula in step 9 and the link control rule in step 11, and does not constitute a limitation on the scope of protection of this invention.
[0205] In this embodiment, the main experiment is denoted as E1, which represents the verification process of performing query, scoring, and statistical analysis on four knowledge quality levels under the same problem set and the same operating environment.
[0206] (2) Verification Process and Statistical Methods: During the verification process, four knowledge quality levels of 100%, 75%, 50%, and 35% are first constructed based on the complete knowledge graph. Then, under each level, the query generation, query statement optimization, and query execution steps described in Example 2 are repeatedly executed on the same question set, and the symbolic side score, neural side score, comprehensive score, and corresponding link control results are calculated according to the methods described in Examples 3, 4, and 5. The question set includes single-value queries, time-varying questions, and comparison questions. The same question set is repeatedly executed under each level to ensure the comparability of statistical results between different levels. The graph database environment, structured query execution environment, and large language model calling environment are consistent with the system operating environment described in Example 1.
[0207] In terms of statistical methods, the mean and standard deviation of the problem-level scoring results were calculated separately for each level and question type, and statistical tests were performed on the differences between representative levels. Simultaneously, the comprehensive scoring results were combined with the hit rate, auxiliary reference items, and component comparison results to form... Figure 5 , Figure 6 , Figure 7 The verification results are shown in Table 1. The values of 95.97 and 28.39 in Example 5 are representative single-sample calculation results for hit and miss scenarios, respectively, and do not represent the overall mean obtained from all samples in the E1 main experiment at the corresponding knowledge quality level. The link control implementation corresponding to step 11 has already been given in step 3 of Example 5; therefore, this example will not add new experimental illustrations specifically for this subsequent usage method.
[0208] Figure 5 This demonstrates the corresponding relationship between hit rate and overall score in the E1 main experiment. Figure 5As can be seen, as the knowledge quality level decreases from 100% to 35%, both the hit rate and the overall score decrease, indicating that the overall score of this invention can show a consistent trend of change as the knowledge quality deteriorates, and forms a corresponding relationship with the query hit rate. Figure 6 This demonstrates the symbolic scoring in the E1 main experiment. Neurological score and overall score Comparison of the hierarchical monotonicity of the three types of ratings. Figure 6 As can be seen, all three types of scores decrease as the knowledge quality level decreases. Among them, the comprehensive score shows a clear monotonic response trend under multi-level conditions, indicating that the present invention can reflect the impact of changes in knowledge quality under a unified evaluation framework. Figure 7 The presentation showed the overall score at four knowledge quality levels: 100%, 75%, 50%, and 35%. The mean and error range. Figure 7 The main focus is on demonstrating how the comprehensive scoring system differentiates different levels of knowledge quality under a unified scoring scale, as well as the unified expression of the support capabilities of the generation process.
[0209] (3) Explanation of multi-level validation results: In the existing main experiment statistics, the overall average score of the E1 main experiment, including both hit and miss samples, was 99.20±4.32, 82.05±21.73, 58.38±28.63, and 46.50±29.13 for the four knowledge quality levels of 100%, 75%, 50%, and 35%, respectively; the corresponding hit rates were 100%, 78.0%, 48.0%, and 37.3%, respectively. Among them, there was a 35.55-point difference between the 75% level and the 35% level, and the difference was statistically significant (p<0.001). The above results are mainly used to illustrate the trend and technical effect of multi-level differentiation, and do not take any one set of example parameters as the sole limitation.
[0210] Related multi-level results are as follows Figure 5 , Figure 6 and Figure 7 As shown. Figure 5 Used to illustrate the corresponding relationship between hit rate and overall score; Figure 6 Used to illustrate symbolic side rating Neurological score and overall score Comparison of the hierarchical monotonicity of the three types of ratings; Figure 7 Used to explain the overall score The mean and error range for each level. The above results illustrate that the present invention can form a differentiated and unified support capability scoring expression across different knowledge quality levels.
[0211] (4) Explanation of verification results by question type: In the verification by question type, the comprehensive scores of the time-change question type in the 75% and 35% ranges were 74.70±23.09 and 47.32±27.48, respectively, with a difference of 27.38 between ranges; the comprehensive scores of the comparison question type in the 75% and 35% ranges were 75.13±22.44 and 38.54±17.20, respectively, with a difference of 36.59 between ranges. The above results are used to illustrate that the present invention can form a uniform scale of range differentiation results in different generation task scenarios such as time change and object comparison.
[0212] (5) Component comparison and gated path verification: In the component comparison verification, when only symbolic side scoring is used, the difference between the 75% and 35% scores is 40.27; when only neural side scoring is used, the difference is 27.93; and the difference between the scores of the complete gated dual-path fusion framework is 35.55. This result illustrates that symbolic side evaluation and neural side evaluation reflect different dimensions of supporting information, while the complete framework can comprehensively characterize the knowledge support state and the answer behavior state under a unified evaluation semantic.
[0213] Furthermore, in the gated path validation, the inter-level difference of the complete framework is 35.55, while the inter-level difference after removing the gating mechanism is 34.10, resulting in a net improvement of 1.45 from the gating mechanism. These results illustrate that after switching paths based on query hit status, the comprehensive score can better distinguish the evaluation semantics under different support scenarios.
[0214] (6) Parameter Configuration Verification: The parameter configuration verification results show that, within a reasonable parameter range, the inter-level differentiation results and evaluation expressions formed by the present invention under different parameter settings generally maintain a comparable trend, indicating that the main conclusions of the present invention do not depend on a single fixed parameter. The implementation of related link control has been described in step 3 of Example 5, and will not be repeated in this example.
[0215] (7) Explanation of Supplementary Reference Comparison Results: To help illustrate the changing trend of the overall score results, this implementation method introduces B1 and B2 as reference comparison items. In the supplementary reference comparisons at two representative levels, 75% and 35%, the overall score... The auxiliary error index was 12.54, lower than B1's 27.55, B2's 20.50, and the sign-side score. 21.06; Neurological score The auxiliary error index was 6.66, but since this reference result is correlated with the neurological score in some behavioral characteristics, this result is only used as an auxiliary explanation. Relevant results are shown in Table 1.
[0216] Table 1. Comparison Results of Overall Score and Reference Items at Representative Levels ; Note: The auxiliary error index is used to provide supplementary explanations of the differences between different scoring methods and the reference results, and is not used as the main basis for determining the validity of this invention. Among them, B1 is the level label reference item, that is, the value corresponding to the current knowledge quality level is directly used as the reference output; B2 is the hit status reference item, that is, the reference output constructed based on the query hit flag H and whether the result is empty. The auxiliary error index is calculated based on its comparison with the reference consistency result, since the reference result is consistent with... There are correlations in some behavioral characteristics, so this result is only used as a supplementary explanation.
[0217] In summary, this embodiment demonstrates, through multi-level verification, question-type verification, component comparison verification, parameter configuration verification, and auxiliary reference comparison, that the gated dual-path unified evaluation mechanism established by this invention can consistently express the supporting capabilities of knowledge graphs in generation tasks, and its related technical effects and implementation results correspond to the beneficial effects.
[0218] Example 7: Explanation of Applicable Boundaries; This example is used to illustrate the detection boundaries of the method of the present invention.
[0219] (1) Value-level attribution error pollution scenario: This invention mainly evaluates the support capability of knowledge graphs for the generation link of large language models. For cases where the returned value has errors in entity attribution, time attribution, or attribute attribution, but the query support process remains consistent, this type of problem is closer to the scope of value-level fact truth verification, which can be implemented independently by additional modules and does not fall within the core detection scope of the current main scoring method of this invention.
[0220] (2) Abnormal scenario analysis: When the query is hit but the answer is still obviously uncertain, or when the query is not hit but the answer still gives a definite result, the neural side score will decrease accordingly, and further affect the comprehensive score result. This phenomenon shows that the method of the present invention can respond to abnormal generation behaviors such as query hit but overly cautious answer, and query not hit but still give a definite answer, thereby enhancing the ability to analyze generation behavior bias.
[0221] The aforementioned applicable boundaries do not affect the positioning of this invention as a method for evaluating the support capabilities of generation links; this invention is used to characterize the changes in the support capabilities of knowledge graphs for generation tasks.
[0222] KG: Knowledge Graph; RAG: Retrieval-Augmented Generation; LLM: Large Language Model; Cypher: Neo4j graph database query language; NDVI: Normalized Difference Vegetation Index. : symbolic-side score; : neural-side score; V: Comprehensive score; H: Knowledge visibility at the current level; N: Hit indicator; R: Number of returned records; U: Refusal indicator; C: Uncertainty indicator; G: Gating function output, used to trigger the corresponding evaluation path based on the query support status; E: Empty-result indicator; GID: Grid identifier; QL: Quality level set; MAE: Mean Absolute Error; E1: Main experiment number; B1: Level label reference item; B2: Hit status reference item.
Claims
1. A large model generation link control method based on knowledge graph quality assessment, characterized in that, include: Obtain the knowledge graph and question set to be evaluated; For each question in the question set, a structured query statement for querying the knowledge graph is generated and executed to obtain query results, and a query hit flag is determined based on the query results; wherein, the query hit flag is used to indicate whether valid data is found from the knowledge graph; A symbolic score is calculated based at least on the query hit flag and the knowledge visibility index; wherein the symbolic score is used to characterize the query support capability. The question and the query results are input into a large language model to generate a natural language answer. The answer behavior features are extracted from the natural language answer, and the consistency between the answer behavior and the query result is determined based on the answer behavior features and the query hit flag; wherein, the answer behavior features include at least whether the natural language answer is characterized as a refusal to answer and whether it is characterized as an expression of uncertainty; Based on the answer behavior characteristics, the consistency status, and the query hit flag, a scenario-adaptive approach is used to calculate the neural side score; wherein, in different scenarios of query hit and query miss, the neural side score has opposite evaluation directions for deterministic answers, uncertain expressions, and refusal to answer. Based on the query hit flag, the symbol-side score and the neural-side score are fused using a gated dual-path system to obtain a comprehensive score. Based on the comprehensive score, control signals are generated to control the large language model generation link.
2. The large model generation link control method based on knowledge graph quality assessment according to claim 1, characterized in that, The knowledge visibility index is determined based on a preset knowledge quality level; the symbolic score is calculated by weighting and summing the knowledge visibility index and the query hit flag to obtain the symbolic score.
3. The large model generation link control method based on knowledge graph quality assessment according to claim 1, characterized in that, The generation and execution of the structured query statement for querying the knowledge graph further includes: Acquire knowledge by configuring quality levels; Based on the current knowledge quality level to be evaluated, a knowledge visibility constraint is added to the structured query statement so that the query is executed only within the knowledge range that is allowed to be accessed at the current level, thereby obtaining query results and query hit flags corresponding to the knowledge quality level.
4. The large model generation link control method based on knowledge graph quality assessment according to claim 3, characterized in that, The step of adding knowledge visibility constraints to the structured query statement based on the current knowledge quality level to be evaluated includes: Identify target node variables in the structured query statement that are directly related to the natural language answer; Determine the entity type corresponding to the target node variable; Determine the corresponding knowledge visibility restrictions based on the current knowledge quality level; The knowledge visibility constraints are added to the query condition part or node attribute constraint part of the structured query statement.
5. The large model generation link control method based on knowledge graph quality assessment according to claim 1, characterized in that, The extraction of answer behavior features from the natural language answer includes: The natural language answer is matched with a preset set of rejection templates to obtain a rejection matching score. When the rejection matching score is greater than or equal to a first threshold, it is determined that the natural language answer has a rejection behavior. The natural language answer is matched with a preset set of uncertainty templates to obtain an uncertainty matching score. When the uncertainty matching score is greater than or equal to a second threshold, it is determined that the natural language answer contains uncertain expressions.
6. The large model generation link control method based on knowledge graph quality assessment according to claim 5, characterized in that, The determination of consistency between the answer behavior and the query result based on the answer behavior characteristics and the query hit flag includes: When the query hit flag indicates a query hit, and the natural language answer contains neither rejection nor uncertainty, it is determined to be consistent; or, When the query hit flag indicates that the query was not hit, and the natural language answer contains rejection behavior or uncertain expression, it is determined to be consistent; Otherwise, it is judged as inconsistent.
7. The large model generation link control method based on knowledge graph quality assessment according to claim 1, characterized in that, The calculation of neural side scores using a scene-adaptive approach includes: When the query hit flag indicates a query hit, a positive evaluation is given to a definitive answer, and a negative evaluation is given to an uncertain expression and a refusal to answer. When the query hit flag indicates that the query has not been hit, positive evaluation is given to uncertain expressions and refusal to answer, and negative evaluation is given to certain answers; Furthermore, the consistency status is used as a scoring weighting factor to constrain the neural-side scoring to be consistent with the query support scenario.
8. The large model generation link control method based on knowledge graph quality assessment according to claim 1, characterized in that, The step of performing gated dual-path fusion on the symbol-side score and the neural-side score based on the query hit flag to obtain a comprehensive score includes: A gated output value is generated using a smooth gating function that takes the query hit flag as input. Using the gated output value as a dynamic weight, the symbolic side score and the neural side score are weighted and fused according to the first set of path weights and the second set of path weights to obtain the comprehensive score; wherein, the first set of path weights corresponds to the query supported scenarios, and the second set of path weights corresponds to the query unsupported scenarios.
9. The large model generation link control method based on knowledge graph quality assessment according to claim 1, characterized in that, Based on the comprehensive score, control signals are generated to control the large language model generation link, including: When the query hit flag indicates a query hit, if the overall score is higher than a preset release threshold, a release signal is generated; if the overall score is lower than a preset security threshold, a blocking signal or a review flag is generated. When the query hit flag indicates that the query has not been hit, no pass signal is generated; if the comprehensive score is not lower than the security threshold, a knowledge support insufficiency warning signal and a review flag are generated; if the comprehensive score is lower than the security threshold, a knowledge support insufficiency warning signal and a blocking signal are generated.
10. The large model generation link control method based on knowledge graph quality assessment according to claim 9, characterized in that, When the neural side score or the symbol side score is lower than a preset abnormality threshold, a status record tag is generated and stored together with the query hit flag and the comprehensive score for anomaly localization and statistical analysis.