A code context understanding method based on a large language model
By constructing a unified dynamic knowledge graph for projects, combined with multiple rounds of iterative optimization and multi-level verification, the problem that large language models cannot perceive dynamic semantics and historical logic in code processing has been solved, realizing a panoramic understanding and generation of code, and improving the efficiency and quality of software development.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DELUMENGYU (HEBEI XIONGAN) TECHNOLOGY CO LTD
- Filing Date
- 2026-04-02
- Publication Date
- 2026-06-30
AI Technical Summary
Existing large language models cannot perceive the dynamic runtime semantics, historical evolution logic, and real-time engineering context of code during code processing. This results in generated code that is detached from the actual project environment and lacks architectural and task adaptability, making it difficult to uniformly handle multiple tasks such as code understanding, generation, optimization, and review.
A unified dynamic knowledge graph for projects is constructed. By parsing the abstract syntax tree of the source code, monitoring function calls and Git commit history in real time, a graph-structured knowledge graph containing static syntax, dynamic execution trajectory and historical evolution is formed. Combined with multiple rounds of iterative optimization and multi-level verification, a closed-loop learning of code generation and understanding is achieved.
It achieves a panoramic and penetrating understanding of the codebase, and the generated code has solid runtime correctness and performance guarantees. It can proactively warn of potential risks and improve the efficiency, quality and security of software development.
Smart Images

Figure CN122308901A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of large language model technology, and more specifically, to a code context understanding method based on a large language model. Background Technology
[0002] Large-scale language models refer to advanced artificial intelligence algorithms trained on massive amounts of data. Natural language processing systems with over 100 billion parameters can be used for customized AI applications such as content generation, text summarization, chatbots, coding, and predicting protein structures and biomolecular properties.
[0003] Existing code completion models or specialized vulnerability detection tools essentially treat code as a flat text sequence. They rely on statistical patterns trained on massive code corpora by large language models for probabilistic imitation, rather than true logical understanding. This results in the model being unable to perceive the dynamic runtime semantics, historical evolution logic, and real-time engineering context of the code, thus generating theoretical code or suggestions that are detached from the actual project environment. Secondly, the architecture and task adaptability are insufficient. Many excellent models perform well on specific tasks but have architectural limitations. For example, pure encoder models are good at code understanding and retrieval, pure decoder models are good at code generation, and encoder-decoder models attempt to combine the two, but still struggle to achieve optimal performance on specific tasks. This fragmentation makes it complex to build an agent that can uniformly handle multiple tasks such as code understanding, generation, optimization, and review. Summary of the Invention
[0004] In view of the problems existing in the prior art, the purpose of this invention is to provide a code context understanding method based on a large language model. On the basis of achieving code context understanding, this invention can also achieve semantic understanding and engineering collaboration, and can have the ability to learn and evolve autonomously throughout life.
[0005] To solve the above problems, the present invention adopts the following technical solution: A code context understanding method based on a large language model includes: Step S1: Construct a unified project dynamic knowledge graph: parse the source code of the target project, construct its abstract syntax tree as the static semantic foundation, and embed a monitoring agent in the code execution environment to collect function call sequences, variable state changes and exception trigger points in real time as dynamic execution trajectories. parse the project's Git commit history and construct a historical graph with commits as nodes and change relationships as edges. Integrate and store the three types of heterogeneous data—static syntax structure, dynamic execution trajectory, and historical evolution—in a unified, graph-structured project dynamic knowledge graph through entity alignment. Step S2, Dynamic Execution Trajectory Feedback and Evolutionary Optimization Based on Knowledge Graph: When a code generation or understanding task is received, multiple candidate code solutions are retrieved and generated based on the current real-time development environment state and the project dynamic knowledge graph. For each candidate solution, its executable part is extracted and executed in the provided test cases or sandbox environment to obtain a structured execution trajectory containing variable states and runtime behavior as feedback. Using the execution trajectory feedback, the candidate solutions are iteratively optimized through revision, reorganization and refinement operations to make the solutions continuously evolve on the reasoning trajectory. Step S3: Receive the final code solution optimized by the evolutionary algorithm: Perform multi-level closed-loop verification combining static analysis and runtime verification on the final code solution. When the verification is successful, the new execution trajectory, solution and its association with entities in the knowledge graph generated by this task are structured. Based on the verification results and predefined update strategies, the processed information is fed back and updated to the project dynamic knowledge graph, thereby realizing a continuous learning closed loop from environmental awareness and code optimization to knowledge accumulation.
[0006] Compared with the prior art, the advantages of this invention are: This invention constructs a multi-dimensional digital twin of code through a unified project dynamic knowledge graph. It does not simply parse code text, but deeply integrates and aligns static syntax, dynamic behavior, and historical evolution with entities. For the first time, AI collaborators can have a panoramic and penetrating understanding of the codebase, just like senior human engineers. It can not only understand function signatures, but also know how the function is called online, where the performance bottlenecks are, and how many major refactorings it has undergone and the reasons for them. Based on this graph, evolutionary optimization elevates the optimization object from fragile code text to verifiable execution trajectories. Through a closed loop of multiple rounds of generation, execution, evaluation, and revision, it ensures that the output solution has solid runtime correctness, performance guarantees, and strong consistency with the project architecture. This completely reverses the predicament of AI-generated code that looks correct but runs incorrectly, making its output reach a trustworthy and deliverable production-grade standard.
[0007] This invention ensures work quality through rigorous multi-level verification. Furthermore, through an intelligent graph self-updating mechanism, it structurally feeds back and solidifies every successful fix, efficient optimization strategy, and newly discovered code pattern into the project's dynamic knowledge graph with high confidence and consistency checks. It introduces neuroscience-inspired dynamic weight adjustment and cybernetics-based feedback regulation mechanisms, allowing knowledge connections within the graph to dynamically strengthen or weaken based on usage frequency and effectiveness. The entire system automatically balances exploring new knowledge with maintaining accuracy. As the system accompanies project development, its professionalism, insight, and collaborative efficiency will increase exponentially over time. It evolves from a general programming assistant into a super expert with a deep understanding of the project's specific architecture, specifications, defect patterns, and team habits. It can proactively warn of potential risks and recommend solutions best suited to the project context, truly realizing the transformation from assisted programming to augmented intelligence, bringing a leap forward in the efficiency, quality, and security of software development. Attached Figure Description
[0008] Figure 1 This is a flowchart illustrating the steps of a code context understanding method based on a large language model according to the present invention. Figure 2 This is a flowchart illustrating the specific steps involved in constructing a unified project dynamic knowledge graph in a code context understanding method based on a large language model, as described in this invention. Detailed Implementation
[0009] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0010] Example: Please see Figure 1-2 A code context understanding method based on a large language model includes: Step S1: Construct a unified project dynamic knowledge graph: parse the source code of the target project, construct its abstract syntax tree as the static semantic foundation, and embed a monitoring agent in the code execution environment to collect function call sequences, variable state changes and exception trigger points in real time as dynamic execution trajectories. parse the project's Git commit history and construct a historical graph with commits as nodes and change relationships as edges. Integrate and store the three types of heterogeneous data—static syntax structure, dynamic execution trajectory, and historical evolution—in a unified, graph-structured project dynamic knowledge graph through entity alignment. Step S2, Dynamic Execution Trajectory Feedback and Evolutionary Optimization Based on Knowledge Graph: When a code generation or understanding task is received, multiple candidate code solutions are retrieved and generated based on the current real-time development environment state and the project dynamic knowledge graph. For each candidate solution, its executable part is extracted and executed in the provided test cases or sandbox environment to obtain a structured execution trajectory containing variable states and runtime behavior as feedback. Using the execution trajectory feedback, the candidate solutions are iteratively optimized through revision, reorganization and refinement operations to make the solutions continuously evolve on the reasoning trajectory. Step S3: Receive the final code solution optimized by the evolutionary algorithm: Perform multi-level closed-loop verification combining static analysis and runtime verification on the final code solution. When the verification is successful, the new execution trajectory, solution and its association with entities in the knowledge graph generated by this task are structured. Based on the verification results and predefined update strategies, the processed information is fed back and updated to the project dynamic knowledge graph, thereby realizing a continuous learning closed loop from environmental awareness and code optimization to knowledge accumulation.
[0011] In a specific embodiment of the present invention, a unified dynamic knowledge graph for the project is constructed. This graph is not a static code snapshot, but rather integrates the static syntactic structure, dynamic execution behavior, and historical evolution of the project. These three types of heterogeneous data are aligned with entities to form a graph-structured, active project memory. When processing code tasks, the method generates candidate solutions based on this graph and the real-time environment state, and introduces an evolutionary optimization mechanism based on execution trajectory feedback. Through multiple rounds of revision, reorganization, refinement, and iteration, the code solutions continuously evolve in real-world running feedback. Finally, through closed-loop verification and graph self-updating, successful experiences are structurally fed back to the knowledge graph, forming a continuous learning closed loop from perception and optimization to knowledge accumulation.
[0012] Specifically, step S1 includes: S101, Static Syntax Structure Parsing: Parses the source code of the target project and constructs its Abstract Syntax Tree (AST) through lexical analysis, syntax analysis, and semantic analysis. The nodes of the AST represent non-terminal and terminal symbols in the program's syntax structure, and the edges represent syntactic relationships. S102. Dynamic Execution Trajectory Acquisition: A monitoring agent is embedded in the code execution environment of the target project to collect function call sequences, variable state changes, and exception trigger points in real time, forming a dynamic execution trajectory. The monitoring agent generates a globally unique tracing identifier through distributed tracing technology. It is passed along the call chain to associate call events across processes or threads, and captures them in triplets by intercepting thread method calls or inter-thread communication primitives. Operation thread, action, operated thread The explicit and implicit context switching information of the thread is represented; S103. Construction of Historical Evolution: Analyze the Git commit history of the target project, abstract each commit as a graph node, and abstract the parent-child relationship between commits as directed edges. Construct a historical directed acyclic graph (DAG) with commits as nodes and change relationships as edges. Each commit node uses the SHA-1 algorithm to perform a hash operation on the commit metadata to generate a unique 40-digit hexadecimal commit hash as an identifier. The commit metadata includes the committer, timestamp, commit message, parent commit hash, and tree object pointer. S104. Heterogeneous Data Alignment and Unified Knowledge Graph Construction: The abstract syntax tree obtained in step S101, the dynamic execution trajectory obtained in step S102, and the historical evolution DAG obtained in step S103 are aligned with entities and integrated and stored in a unified, graph-structured project dynamic knowledge graph. The unified knowledge graph adopts a hierarchical structure, including a plot subgraph for storing original execution fragments, a semantic subgraph for storing entities and their relationships extracted from the plot, and a community subgraph for representing entity communities. The graph attention network GAT or graph convolutional network GCN is used to perform representation learning and reasoning on the integrated knowledge graph.
[0013] In a specific embodiment of the present invention, static parsing uses lexical, syntactic, and semantic analysis to generate an AST, laying the syntactic foundation for the code. Dynamic data collection, through distributed tracing and thread context capture, accurately records runtime behavior across processes and threads, forming high-fidelity execution trajectories. History construction abstracts Git commits into a DAG and uses SHA-1 hashing to ensure node uniqueness, fully preserving the code's evolution logic. The final heterogeneous data alignment is crucial; it aligns and merges entities from different dimensions through similarity calculation and stores them in a hierarchical graph containing plot subgraphs, semantic subgraphs, and community subgraphs. Then, graph neural networks are used for representation learning and reasoning, achieving deep fusion and standardized representation of multi-source heterogeneous data. Through distributed tracing and context capture, the collection of dynamic trajectories reaches the accuracy of industrial-grade observability tools, overcoming the shortcomings of traditional LLMs in being unable to perceive runtime states. Modeling Git history as a DAG and calculating generational numbers introduces a time dimension and causal reasoning capabilities to code understanding. The application of hierarchical graph structures and graph neural networks enables the system not only to store facts but also to perform deep relational reasoning and community discovery, transforming fragmented information into a structured knowledge system that can be directly reasoned about by machines, providing a solid and computable data foundation for subsequent intelligent decision-making.
[0014] Specifically, the construction of the abstract syntax tree in step S101 employs either a top-down or bottom-up parsing method: When using top-down parsing, the mathematical model of the grammar rules followed is: ,in As the starting symbol, and It is a non-terminal symbol. For terminal symbols, the parser recursively derives from the start symbol S until all input terminal symbols are matched, thus constructing the AST; When using bottom-up parsing, the mathematical model of the grammar rules followed is: The parser starts with the input terminal symbol sequence and uses a stack structure to continuously reduce it until it is reduced to the starting symbol. Complete the AST construction.
[0015] In a specific embodiment of the present invention, two classic parsing algorithms for AST and their formal models are constructed, and the AST is parsed from top to bottom starting from the syntax start symbol. Initially, production rules are recursively applied to attempt to derive a syntax tree that perfectly matches the input lexical unit sequence. Bottom-up parsing starts from the input sequence and uses a stack structure to continuously reduce matching terminal and non-terminal symbols according to production rules until only the start symbol remains in the stack. These two methods are the core of compiler front-end technology, ensuring a precise and unambiguous formal description of the code's syntactic structure.
[0016] Specifically, the implementation of the monitoring agent in step S102 includes: for Language environment, through based on The interface is implemented to inject monitoring logic during bytecode loading; for Language environment, through The object wraps the target function, in The interceptor records the function call parameters and return values; The acquisition metrics for the dynamic execution trajectory include function call frequency, average response time, exception throw stack, and variable value sequence. A time window threshold is set to retain only the trajectory data within the most recent specific time period to control the storage scale.
[0017] In specific embodiments of the present invention, cross-platform, low-intrusion runtime monitoring is achieved. By utilizing the native mechanisms of various language platforms, the monitoring agent can acquire rich dynamic semantic information with extremely low performance loss, transforming dynamic execution trajectories from a concept into an engineering-achievable characteristic. The collected multi-dimensional indicators provide quantitative basis for evaluating code performance, discovering potential bottlenecks and abnormal patterns, while the time window threshold strategy ensures that the system has good scalability and can run for a long time in large, active projects without causing data explosion. It balances knowledge freshness and system burden, providing key data flow guarantees for building a continuously evolving dynamic knowledge graph.
[0018] Specifically, the construction and optimization of the historical evolution DAG in step S103 includes: Perform a topological sort on the DAG to ensure that all parent nodes have been processed before any child node is processed. The sorting time complexity is O(log n). ,in For the number of nodes, The number of sides; Calculate an intergeneration number for each commit node. The intergeneration number is an integer based on the topology depth or a modified value combined with the commit timestamp. It is used to quickly determine the reachability between nodes. If the intergeneration number of node A is less than that of node B, then A cannot be an ancestor of B. A Bloom filter is used to accelerate the query of a specific path in a DAG, and the false positive rate threshold of the Bloom filter is set to be less than 0.1%.
[0019] In a specific embodiment of the present invention, topological sorting is used to ensure that the processing order of submitted nodes conforms to historical dependencies. The generation number is introduced as a heuristic, and each submitted node is assigned an integer identifier based on topological depth or time. By utilizing the property that if the generation number of node A is less than that of B, then A cannot be an ancestor of B, fast pruning can be performed when querying the reachability between nodes, avoiding expensive full graph traversal. In addition, the Bloom filter, a probabilistic data structure, is used to quickly determine whether a certain submission or path definitely does not exist in the DAG with minimal memory overhead, further accelerating the query.
[0020] Specifically, the entity alignment in step S104 employs an algorithm based on similarity calculation: For code entities from the AST, execution entities from the dynamic trajectory, and change entities from the commit history, calculate the cosine similarity of their names, contexts, or features. When the cosine similarity exceeds a preset threshold When they are identified as the same entity, they are merged in the unified graph, and their static definitions, dynamic instances, and historical change edges are connected using this entity as a hub.
[0021] In a specific embodiment of the present invention, the automatic association and fusion of cross-dimensional knowledge is realized, which is the key technology for constructing a unified graph. It avoids the errors caused by relying on manual rules or fragile string matching. Through quantifiable similarity calculation and adjustable threshold, flexible and accurate entity alignment is achieved. This allows the static signature, dynamic performance and historical modification records of the same function to be organically linked together, forming a complete digital twin archive of the entity. This deep fusion makes subsequent reasoning possible and truly releases the value of multi-source data aggregation.
[0022] Specifically, step S2 includes: S201, Context-Aware Candidate Solution Generation: Upon receiving a code generation or understanding task, based on the current real-time development environment state, historical code fragments, execution trajectories, and fix cases semantically similar to the task description are retrieved from the project's dynamic knowledge graph. The retrieved context is used to provide hints to the large language model, employing temperature parameters. The kernel sampling strategy generates N kernels in parallel. Initial candidate code solutions are selected to ensure the diversity and innovation of the generated results; S202. Acquisition and Quantitative Evaluation of Structured Execution Trajectory: Each candidate code solution generated in step S201 is executed in the provided test case set or isolated sandbox environment. Its complete runtime behavior is monitored and recorded to form a structured execution trajectory containing function call sequence, key variable value changes, resource consumption and exception information. Based on this trajectory, a multi-dimensional evaluation module is used to calculate the quantitative score of each candidate solution. The multi-dimensional evaluation module includes a static analysis scorer, a dynamic execution scorer and a graph consistency scorer. S203. Trajectory-based evolutionary iterative optimization: Using the execution trajectory and quantitative score obtained in step S202 as feedback signals, the candidate solution pool is iteratively optimized in multiple rounds. The optimization process involves three core operations: revision, reorganization, and refinement, which enable the solution to continuously evolve at the inference trajectory level rather than the code text level. S204. Selection and Verification of the Final Solution: After a preset number of rounds... Evolutionary iteration or when the evaluation score of the optimal solution exceeds the success threshold When the optimization loop terminates, the optimal solution is selected from the final solution pool based on the comprehensive evaluation score, and then verified again in a complete test suite and simulation environment to ensure its functional correctness, performance compliance, and compatibility with the project architecture.
[0023] In a specific embodiment of the present invention, during the candidate solution generation stage, knowledge graphs are used for enhanced retrieval and generation, and a kernel sampling strategy controlled by temperature parameter T is employed to generate multiple solutions in parallel to balance diversity and quality. In the quantitative evaluation stage, a comprehensive reward function is used. The execution trajectory of each solution is scored. This function comprehensively considers functional correctness, inference quality, cost efficiency and graph consistency. In the evolutionary optimization stage, low-scoring solutions are revised, high-scoring elite solutions are reorganized, and robust solutions that have reached consensus on multiple independent solutions are selected through K-means clustering-based refinement operations.
[0024] Specifically, the multi-dimensional evaluation module comprehensive reward function described in step S202 The calculation follows the formula: ; in The comprehensive reward value, in units of one, represents the candidate solution's performance in iteration rounds. The quality of time, The reward is for functional correctness, with a unit of one, and its core calculation is based on... Indicators, through execution trajectory The pass / fail calculations on the test suite show that for c candidate solutions, if s pass the test, then in a single round... The probability is estimated as This metric directly measures the actual utility of the code. The reward for reasoning quality, measured in units of one, is calculated by analyzing the control flow logic coherence, data flow integrity, and exception handling robustness in the execution trajectory. It can be quantified by an LLM evaluator or a rule set. For cost efficiency bonuses, the unit is time in seconds or the reciprocal of memory / CPU cycles, calculated as follows: , This refers to the actual time or resource consumption of this execution. For the evolutionary iteration cycle, Attenuation coefficient This is used to encourage faster convergence as iterations progress. As a graph consistency reward, in units of one, the mean cosine similarity between the execution trajectory pattern of this scheme and the historical successful trajectories stored in the knowledge graph under the same module and the same anomaly type is calculated. For the weighting coefficients, satisfying ,and They are usually assigned the highest weight ≥0.5, with the core objective of ensuring functional correctness; The revision operation described in step S203 is as follows: for evaluation scores below the revision threshold The candidate solution is selected, and a revision process is initiated. This process utilizes the reflective capabilities of a large language model. The code of the current solution and its failure / inefficiency execution trajectory are used as input to prompt the model to analyze the defective links in the trajectory and generate revision instructions. The revision follows a chain of reasoning for localization, diagnosis and patching, and the revised code must be able to pass the core test cases that caused the original solution to fail. This process mimics the self-adjustment and reflection mechanism of an intelligent agent. The reorganization operation described in step S203 uses an evolutionary search algorithm based on execution rewards to select the top M solutions in terms of reward value from the current solution pool, where M = top 30% of the solutions are selected as the elite group. The refinement operation described in step S203 involves re-evaluating all solutions in the pool after each evolutionary iteration, and then using a ranking algorithm based on code test consistency for selection. This algorithm treats the execution results of each solution on all test cases as a vector, identifies the consensus set through K-means clustering, and considers not only the reward value of individual solutions. The solution is selected based on its level, while also considering the size of its consensus set and the average reward value of that set, prioritizing solutions that have received consensus support from multiple other independent solutions.
[0025] In a specific embodiment of the present invention, through mathematical modeling and algorithm definition, this method transforms the seemingly subjective code optimization process into a quantifiable, controllable, and interpretable automated process. The weighting of the reward function ensures the core position of functional correctness, while graph consistency rewards... The introduction of this mechanism enables the generated solution to automatically align with the project's historical best practices and architectural constraints, avoiding the introduction of technical debt. The revision threshold and consensus set mechanism provide dual protection, both promptly fixing obvious defects and avoiding accidental errors through statistical methods. This design makes the entire optimization process not only efficient but also highly deterministic and predictable, making it suitable for integration into a rigorous engineering development pipeline.
[0026] Specifically, the multi-level closed-loop verification in step S3 includes: constructing a verification pipeline consisting of a static verification module and a dynamic verification module. The static verification module performs non-execution analysis on the code, including generating an abstract syntax tree based on lexical analysis and syntax analysis, generating a directed control flow graph based on control flow analysis, traversing the control flow graph based on data flow analysis, and applying abstract interpretation techniques. The dynamic verification module executes the code in a controllable environment. The results of static verification and dynamic verification are cross-checked. If a conflict is found, re-analysis is triggered or the verification is judged as failing, thus forming a verification closed loop. The conditions for passing the verification are determined by a set of quantifiable threshold indicators, including: the number of static analysis defects must be lower than the threshold. ,in The criteria are positive integers set based on historical project data; the number of critical security vulnerabilities must be zero; and the code complexity metric is cyclomatic complexity, which must be below a certain threshold. The formula for calculating cyclomatic complexity is: , To control the number of edges in the flow graph, For the number of nodes, The number of connected components, all of which are dimensionless, requires a runtime assertion pass rate of 100%. ,in The preset percentage is 100%, and the performance metrics must be at the historical baseline value. Within the range, This is a preset percentage; The algorithm and metrics for feeding back and updating the processed information to the project's dynamic knowledge graph include: calculating the fusion confidence level of new knowledge units extracted from the code solution. , Confidence based on static validation Dynamic verification confidence level and source authority We get the weighted sum. ,in The weighting coefficients and , All are dimensionless quantities between 0 and 1, only when Higher than the preset confidence threshold New knowledge units are only allowed to enter the fusion process at this time. During fusion, consistency verification is required to check whether the new relationship conflicts with the existing relationships in the graph in terms of transitive closure. If the number of conflicts detected exceeds a threshold, the process will be terminated. If so, the conflict resolution sub-process is triggered; otherwise, fusion is executed. The weights of the relationships between entities in the knowledge graph are dynamically adjusted, and the weight updates follow an algorithm based on the entity nodes in the knowledge graph. With entity nodes Connection weights between Based on the improved Hebb rule, iterative adjustments are made to simulate the reinforcement and forgetting mechanisms of neural networks. The weight update formula is as follows: , ,in and Representing entity nodes respectively and In the most recent time window The activation level within an entity is quantified by the frequency with which the entity is associated in query, validation tasks, or code solutions; it is a dimensionless positive number. The learning rate reinforcement coefficient, These are forgetting coefficients, all of which are preset positive decimals. The weights are dimensionless. When a successful loop closure verification task strengthens the association, the activation of related entities increases, thus increasing their connection weights. Relationships that are not activated for a long time will be strengthened, while those that are not activated for a long time will gradually weaken.
[0027] In a specific embodiment of the present invention, multi-level verification combines static analysis and dynamic verification. The results of the two are mutually verified to form a closed loop. The verification pass standard is strictly defined by a series of quantitative thresholds. The knowledge update decision introduces fusion confidence. Only new knowledge above the threshold is allowed to be updated, and consistency checks are performed to prevent conflicts. The dynamic adjustment of weights adopts the improved Hebb rule, which allows the weights of relationships between entities to be dynamically strengthened or weakened according to the frequency of use, simulating the long-term memory and forgetting mechanism of the human brain.
[0028] Specifically, the continuous learning closed loop is dynamically adjusted through a feedback control module. The module monitors key system indicators and adjusts and updates parameters in the following steps: The feedback control module employs a PID-like control strategy to monitor the accuracy of the knowledge graph question-answering function in real time after the knowledge update. and query miss rate ,when The drop exceeded the threshold or Elevation exceeding the threshold At the same time, automatically increase the confidence threshold for new knowledge fusion. Or reduce the learning rate in the weight update formula. Conversely, when a significant lack of knowledge coverage is detected, the threshold is lowered. Alternatively, the data collection frequency can be increased to achieve a balance between knowledge freshness and accuracy, thus maintaining closed-loop stability.
[0029] In a specific embodiment of the present invention, the feedback control module simulates the PID control concept in engineering, using the system's key performance indicators—question-answer accuracy and query miss rate—as controlled variables for real-time monitoring. When monitoring detects... The decline exceeded or Rise above If this indicates that recently incorporated new knowledge may have reduced the quality of the graph, the controller will automatically increase the confidence threshold for new knowledge fusion. Or reduce the learning rate This causes the system to shift to a conservative mode, absorbing new knowledge more cautiously. Conversely, if the system behaves too conservatively and has insufficient knowledge coverage, the parameters are adjusted in the opposite direction to encourage more active learning.
[0030] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and its improved concept, should be covered within the scope of protection of the present invention.
Claims
1. A large language model-based code context understanding method, characterized in that, include: Step S1: Construct a unified project dynamic knowledge graph: parse the source code of the target project, construct its abstract syntax tree as the static semantic foundation, and embed a monitoring agent in the code execution environment to collect function call sequences, variable state changes and exception trigger points in real time as dynamic execution trajectories. parse the project's Git commit history and construct a historical graph with commits as nodes and change relationships as edges. Integrate and store the three types of heterogeneous data—static syntax structure, dynamic execution trajectory, and historical evolution—in a unified, graph-structured project dynamic knowledge graph through entity alignment. Step S2, Dynamic Execution Trajectory Feedback and Evolutionary Optimization Based on Knowledge Graph: When a code generation or understanding task is received, multiple candidate code solutions are retrieved and generated based on the current real-time development environment state and the project dynamic knowledge graph. For each candidate solution, its executable part is extracted and executed in the provided test cases or sandbox environment to obtain a structured execution trajectory containing variable states and runtime behavior as feedback. Using the execution trajectory feedback, the candidate solutions are iteratively optimized through revision, reorganization and refinement operations to make the solutions continuously evolve on the reasoning trajectory. Step S3: Receive the final code solution optimized by the evolutionary algorithm: Perform multi-level closed-loop verification combining static analysis and runtime verification on the final code solution. When the verification is successful, the new execution trajectory, solution and its association with entities in the knowledge graph generated by this task are structured. Based on the verification results and predefined update strategies, the processed information is fed back and updated to the project dynamic knowledge graph, thereby realizing a continuous learning closed loop from environmental awareness and code optimization to knowledge accumulation.
2. The method of claim 1, wherein the method is based on a large language model. Step S1 includes: S101, Static Syntax Structure Parsing: Parses the source code of the target project and constructs its Abstract Syntax Tree (AST) through lexical analysis, syntax analysis, and semantic analysis. The nodes of the AST represent non-terminal and terminal symbols in the program's syntax structure, and the edges represent syntactic relationships. S102, dynamically executing trajectory collection: embedding a monitoring agent in the code running environment of the target project, collecting function call sequence, variable state change and exception trigger point in real time, forming a dynamic execution trajectory, the monitoring agent generates a globally unique tracking identifier through a distributed tracking technology , and passes along the call chain to associate cross-process or cross-thread call events, while capturing thread method call or inter-thread communication primitives through interception thread method call or inter-thread communication primitives, capturing thread explicit and implicit switching context information represented by a triple ; S103. Construction of Historical Evolution: Analyze the Git commit history of the target project, abstract each commit as a graph node, and abstract the parent-child relationship between commits as directed edges. Construct a historical directed acyclic graph (DAG) with commits as nodes and change relationships as edges. Each commit node uses the SHA-1 algorithm to perform a hash operation on the commit metadata to generate a unique 40-digit hexadecimal commit hash as an identifier. The commit metadata includes the committer, timestamp, commit message, parent commit hash, and tree object pointer. S104. Heterogeneous Data Alignment and Unified Knowledge Graph Construction: The abstract syntax tree obtained in step S101, the dynamic execution trajectory obtained in step S102, and the historical evolution DAG obtained in step S103 are aligned with entities and integrated and stored in a unified, graph-structured project dynamic knowledge graph. The unified knowledge graph adopts a hierarchical structure, including a plot subgraph for storing original execution fragments, a semantic subgraph for storing entities and their relationships extracted from the plot, and a community subgraph for representing entity communities. The graph attention network GAT or graph convolutional network GCN is used to perform representation learning and reasoning on the integrated knowledge graph.
3. The method of claim 2, wherein the method further comprises: The construction of the abstract syntax tree in step S101 specifically employs a top-down or bottom-up parsing method: When using top-down parsing, the mathematical model of the syntax rule followed is where S is the start symbol, T and F are non-terminal symbols, and F is a terminal symbol. The parser starts with the start symbol S and recursively derives until all input terminal symbols are matched, building an AST. When using the bottom-up parsing, the mathematical model of the syntax rule followed is The parser starts from the input terminal symbol sequence, and uses the stack structure to constantly reduce until the reduction is the start symbol S, and the AST construction is completed.
4. The method of claim 2, wherein the method is based on a large language model. The monitoring agent described in step S102 is implemented in the following ways: For Language environment, through JavaAgent based on JVMTI interface implementation, in the bytecode loading implantation monitoring logic; For the JavaScript language environment, the target function is wrapped by a Proxy object, and the function call parameters and return values are recorded in the apply interceptor; The acquisition metrics for the dynamic execution trajectory include function call frequency, average response time, exception throw stack, and variable value sequence. A time window threshold is set to retain only the trajectory data within the most recent specific time period to control the storage scale.
5. The code context understanding method based on a large language model according to claim 2, characterized in that, The construction and optimization of the historical evolution DAG in step S103 includes: Perform a topological sort on the DAG to ensure that all parent nodes have been processed before any child node is processed. The sorting time complexity is O(log n). Where N is the number of nodes and E is the number of edges; Calculate the generation number for each commit node. The generation number is an integer based on the topology depth or a modified value combined with the commit timestamp. It is used to quickly determine the reachability between nodes. If the generation number of node A is less than that of node B, then A cannot be an ancestor of B. A Bloom filter is used to accelerate the query of a specific path in a DAG, and the false positive rate threshold of the Bloom filter is set to be less than 0.1%.
6. The code context understanding method based on a large language model according to claim 2, characterized in that, The entity alignment described in step S104 employs an algorithm based on similarity calculation: For code entities from the AST, execution entities from the dynamic trajectory, and change entities from the commit history, calculate the cosine similarity of their names, contexts, or features. When the cosine similarity exceeds a preset threshold When they are identified as the same entity, they are merged in the unified graph, and their static definitions, dynamic instances, and historical change edges are connected using this entity as a hub.
7. The code context understanding method based on a large language model according to claim 1, characterized in that, Step S2 includes: S201, Context-Aware Candidate Solution Generation: Upon receiving a code generation or understanding task, based on the current real-time development environment state, historical code fragments, execution trajectories, and fix cases semantically similar to the task description are retrieved from the project's dynamic knowledge graph. The retrieved context is used to provide hints to the large language model, employing temperature parameters. The kernel sampling strategy generates N kernels in parallel. Initial candidate code solutions are selected to ensure the diversity and innovation of the generated results; S202. Acquisition and Quantitative Evaluation of Structured Execution Trajectory: Each candidate code solution generated in step S201 is executed in the provided test case set or isolated sandbox environment. Its complete runtime behavior is monitored and recorded to form a structured execution trajectory containing function call sequence, key variable value changes, resource consumption and exception information. Based on this trajectory, a multi-dimensional evaluation module is used to calculate the quantitative score of each candidate solution. The multi-dimensional evaluation module includes a static analysis scorer, a dynamic execution scorer and a graph consistency scorer. S203. Trajectory-based evolutionary iterative optimization: Using the execution trajectory and quantitative score obtained in step S202 as feedback signals, the candidate solution pool is iteratively optimized in multiple rounds. The optimization process involves three core operations: revision, reorganization, and refinement, which enable the solution to continuously evolve at the inference trajectory level rather than the code text level. S204. Selection and Verification of the Final Solution: After a preset number of rounds... Evolutionary iteration or when the evaluation score of the optimal solution exceeds the success threshold When the optimization loop terminates, the optimal solution is selected from the final solution pool based on the comprehensive evaluation score, and then verified again in a complete test suite and simulation environment to ensure its functional correctness, performance compliance, and compatibility with the project architecture.
8. The code context understanding method based on a large language model according to claim 7, characterized in that, The multi-dimensional evaluation module comprehensive reward function described in step S202 The calculation follows the formula: ; in The comprehensive reward value, in units of one, represents the candidate solution's performance in iteration rounds. The quality of time, The reward is for functional correctness, with a unit of one, and its core calculation is based on... Indicators, through execution trajectory The pass / fail calculations on the test suite show that for c candidate solutions, if s pass the test, then in a single round... The probability is estimated as This metric directly measures the actual utility of the code. The reward for reasoning quality, measured in units of one, is calculated by analyzing the control flow logic coherence, data flow integrity, and exception handling robustness in the execution trajectory. It can be quantified by an LLM evaluator or a rule set. For cost efficiency bonuses, the unit is time in seconds or the reciprocal of memory / CPU cycles, calculated as follows: , The actual execution time or resource consumption is represented by t, where t is the number of evolution iterations. Attenuation coefficient This is used to encourage faster convergence as iterations progress. As a graph consistency reward, in units of one, the mean cosine similarity between the execution trajectory pattern of this scheme and the historical successful trajectories stored in the knowledge graph under the same module and the same anomaly type is calculated. Let be the weighting coefficient, satisfying ,and They are usually assigned the highest weight ≥0.5, with the core objective of ensuring functional correctness; The revision operation described in step S203 is as follows: for evaluation scores below the revision threshold The candidate solution is selected, and a revision process is initiated. This process utilizes the reflective capabilities of a large language model. The code of the current solution and its failure / inefficient execution trajectory are taken as input, prompting the model to analyze the defective links in the trajectory and generate revision instructions. The revision follows a chain of reasoning for localization, diagnosis and patching, and the revised code must be able to pass the core test cases that caused the original solution to fail. This process mimics the self-adjustment and reflection mechanism of an intelligent agent. The reorganization operation described in step S203 adopts an evolutionary search algorithm based on execution reward, and selects the top M solutions with the highest reward value from the current solution pool, where M = top 30% of the solutions are the elite group; The refinement operation described in step S203 involves re-evaluating all solutions in the pool after each evolutionary iteration, and then using a ranking algorithm based on code test consistency for selection. This algorithm treats the execution results of each solution on all test cases as a vector, identifies the consensus set through K-means clustering, and considers not only the reward value of individual solutions. The solution is selected based on its level, while also considering the size of its consensus set and the average reward value of that set, prioritizing solutions that have received consensus support from multiple other independent solutions.
9. The code context understanding method based on a large language model according to claim 1, characterized in that, The multi-level closed-loop verification in step S3 includes: constructing a verification pipeline consisting of a static verification module and a dynamic verification module. The static verification module performs analysis on the code without actually executing it. The analysis includes generating an abstract syntax tree based on lexical analysis and syntax analysis, generating a directed control flow graph based on control flow analysis, traversing the control flow graph based on data flow analysis, and applying abstract interpretation techniques. The dynamic verification module executes the code in a controlled environment. The results of static verification and dynamic verification are cross-checked. If a conflict is found, a re-analysis is triggered or the verification is judged as failing, thus forming a verification closed loop. The conditions for passing the verification are determined by a set of quantifiable threshold indicators, including: the number of static analysis defects must be lower than the threshold. ,in The criteria are positive integers set based on historical project data; the number of critical security vulnerabilities must be zero; and the code complexity metric is cyclomatic complexity, which must be below a certain threshold. The formula for calculating cyclomatic complexity is: E represents the number of edges in the control flow graph, N represents the number of nodes, and P represents the number of connected components; all are dimensionless quantities. The assertion pass rate verified at runtime must reach a certain level. ,in The preset percentage is 100%, and the performance metrics must be at the historical baseline value. Within the range, K is a preset percentage; The algorithm and metrics for feeding back and updating the processed information to the project's dynamic knowledge graph include: calculating the fusion confidence level of new knowledge units extracted from the code solution. , Confidence based on static validation Dynamic verification confidence level and source authority We get the weighted sum. ,in The weighting coefficients and , All are dimensionless quantities between 0 and 1, only when Higher than the preset confidence threshold New knowledge units are only allowed to enter the fusion process at this time. During fusion, consistency verification is required to check whether the new relationship conflicts with the existing relationships in the graph in terms of transitive closure. If the number of conflicts detected exceeds a threshold, the process will be terminated. If so, the conflict resolution sub-process is triggered; otherwise, fusion is executed. The weights of the relationships between entities in the knowledge graph are dynamically adjusted, and the weight updates follow an algorithm based on the entity nodes in the knowledge graph. With entity nodes Connection weights between Based on the improved Hebb rule, iterative adjustments are made to simulate the reinforcement and forgetting mechanisms of neural networks. The weight update formula is as follows: ,in and Let i and j represent the activation levels of entity nodes i and j within the most recent time window T, respectively. The activation level is quantified by the frequency with which the entity is associated in query, validation tasks, or code solutions, and is a dimensionless positive number. The learning rate reinforcement coefficient, These are forgetting coefficients, all of which are preset positive decimals. The weights are dimensionless. When a successful loop closure verification task strengthens the association, the activation of related entities increases, thus increasing their connection weights. Relationships that are not activated for a long time will be strengthened, while those that are not activated for a long time will gradually weaken.
10. The code context understanding method based on a large language model according to claim 9, characterized in that, The continuous learning closed loop is dynamically adjusted through a feedback control module. The module monitors key system indicators and adjusts and updates parameters in the following steps: The feedback control module employs a PID-like control strategy to monitor the accuracy of the knowledge graph question-answering function in real time after the knowledge update. and query miss rate ,when The drop exceeded the threshold or Elevation exceeding the threshold At the same time, automatically increase the confidence threshold for new knowledge fusion. Or reduce the learning rate in the weight update formula. Conversely, when a significant lack of knowledge coverage is detected, the threshold is lowered. Alternatively, the data collection frequency can be increased to achieve a balance between knowledge freshness and accuracy, thus maintaining closed-loop stability.