Code generation method based on multi-granularity code knowledge graph context engineering optimization
By constructing a multi-granularity code knowledge graph and an attention flow evaluation model, the code generation process is optimized, solving the problems of context redundancy and dependency loss in code generation in existing technologies, and improving the accuracy and logical consistency of code generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HANGZHOU YUANTIAO TECH CO LTD
- Filing Date
- 2026-02-11
- Publication Date
- 2026-06-16
AI Technical Summary
Existing intelligent code generation methods lack an understanding of the deep topological relationships between heterogeneous entities and multi-granular structures in the codebase, resulting in generated prompts that are difficult to cover complex cross-file logical dependencies. Furthermore, they lack effective context engineering optimization techniques, which can easily lead to code redundancy and missing dependencies.
By constructing a multi-granularity code knowledge graph, parsing cross-file call relationships and inheritance relationships, using semantic similarity algorithms to locate core nodes, combining attention flow evaluation models to optimize the contextual knowledge subgraph, and introducing a verification feedback mechanism, an iterative optimization closed loop is formed to improve the accuracy of code generation.
It solves the problems of context redundancy and dependency loss in code generation for large-scale software projects, improves the accuracy and engineering applicability of code generation, and has good semantic accuracy and logical consistency.
Smart Images

Figure CN122219902A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of software engineering technology, specifically relating to a code generation method based on multi-granularity code knowledge graph context engineering optimization. Background Technology
[0002] With the rise of large-scale pre-trained models (such as GPT-4 and Llama 3), AI-based code generation technology has become a key means to improve software development efficiency. However, when dealing with real-world enterprise software projects, existing intelligent code generation methods still face the following serious challenges: First, existing context construction methods mostly rely on simple text fragment retrieval, lacking an understanding of the deep topological relationships between heterogeneous entities (such as configuration files and API protocols) and multi-granular structures (such as statements, functions, and classes) in the codebase, resulting in generated prompts that are difficult to cover complex cross-file logical dependencies. Second, due to the lack of effective context engineering optimization methods, systems often directly input a large number of redundant implementation details into the model, which can easily induce "code illusion" in the model. Therefore, how to build an intelligent code generation method that can deeply understand the heterogeneous relationships of code projects and achieve fine-grained context optimization has become a key technical problem that urgently needs to be solved in the field of intelligent software development. Summary of the Invention
[0003] To overcome the shortcomings of existing technologies, this invention provides a code generation method based on multi-granularity code knowledge graph context engineering optimization. It aims to utilize large language models for code development in large-scale software projects. Through multi-granularity graph modeling and attention flow evaluation mechanisms, it achieves accurate extraction and dynamic compression of code context, thereby improving the semantic accuracy of large-scale code generation and solving the problems of context redundancy and dependency loss in intelligent code generation in large-scale software project development.
[0004] To achieve the above objectives, the present invention can be implemented using the following specific technical solutions: The code generation method based on multi-granularity code knowledge graph context engineering optimization includes the following steps: Step S1: Extract multi-dimensional semantic features from the source software project through static analysis, and construct a multi-granularity code knowledge graph covering projects, modules, classes / interfaces, and functions; parse cross-file call relationships, inheritance relationships, and type references, and establish topological association mappings between entities of different granularities; Step S2: Based on the functional requirements of the target task, locate the core nodes in the multi-granularity code knowledge graph, extract candidate code units using the semantic similarity algorithm, trace the source dependencies upward and expand the implementation downward based on the topological association mapping, and construct a modular context knowledge subgraph that is strongly related to the target task. Step S3: Optimize the modular context knowledge subgraph using the attention flow evaluation model. Quantify the code unit score by calculating node in-degree, topological distance, and semantic weight. Prioritize the code unit based on the score to achieve efficient compression of code representation. Step S4: Map the code representation to the prompt word space to form a prompt word sequence containing structured constraints and type declarations. Input the sequence into the large model for intelligent code generation and verify the results. If a dependency is missing or a logical conflict is detected, it is fed back to step S2 for iterative optimization of the knowledge subgraph.
[0005] Furthermore, step S1 includes: Step S11. Define the node set ,in, It covers the hierarchical code structure of projects, modules, classes / interfaces, and functions. For configuration files, For the operating environment; Step S12. Define the edge set },in, These are weight coefficients assigned based on the relation type; for the calling relation, = Regarding inheritance, = Regarding configuration mapping relationships, = And satisfy ; Step S13. Integrate weighted edges and nodes to construct a multi-granularity code knowledge graph.
[0006] Furthermore, step S2 includes: Step S21. Calculate the semantic similarity between the task description and the graph nodes, and select the Top-K nodes as initial seed nodes. ; Step S22. For center path search, calculate the subgraph coverage function. : , in, For nodes The set of incoming edges, The weight of this edge is used to characterize the strength of the node's dependence in the project topology; Step S23. Based on As a result, a complete subgraph containing the core logic and its necessary context entities is extracted.
[0007] Furthermore, step S3 includes: Step S31. Calculate the attention score of each neighboring node using the graph topology. : , in, for The set of neighboring nodes, For edge weights, It is the attenuation factor; Step S32. Preset attention threshold and Perform characterization at different granularities: like This is mapped to the complete source code text. like < < This is mapped to function names and class declaration text; like < This is mapped to semantic summary text; Step S33. Serialize text fragments of different granularities into a context instruction stream.
[0008] Furthermore, step S4 includes: Step S41. Encapsulate the context instruction stream into prompt words and input them into a large model for intelligent code generation. ; Step S42. Extract External reference symbols in generated code and the set of subgraph nodes Perform comparison and calculate symbol hit rate ; Step S43. If If a logical conflict is detected, identify the graph node pointed to by the missing symbol, and re-execute steps S2 to S41 until the generated code passes the verification.
[0009] Compared with the prior art, the present invention has the following advantages: This invention addresses the issues of context redundancy and dependency loss in large-scale code generation by employing multi-granularity code knowledge graph modeling and attention flow evaluation, thereby improving the accuracy of repository-level code generation. By constructing a multi-granularity code knowledge graph, it achieves deep modeling of complex program structures within projects; dynamically constructs context knowledge subgraphs based on semantic similarity; and utilizes an attention flow evaluation mechanism to generate optimized context instruction flows; furthermore, it introduces a verification-based feedback mechanism to form an iterative optimization loop, thus improving the accuracy of large-scale code generation and demonstrating good engineering applicability. Attached Figure Description
[0010] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation
[0011] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0012] Example 1 like Figure 1 As shown, this embodiment provides a code generation method based on multi-granularity code knowledge graph context engineering optimization, including the following steps: Step S1: Extract multi-dimensional semantic features from the source software project through static analysis, and construct a multi-granularity code knowledge graph covering projects, modules, classes / interfaces, and functions; parse cross-file call relationships, inheritance relationships, and type references, and establish topological association mappings between entities of different granularities.
[0013] Define a set of nodes ,in It covers the hierarchical code structure of projects, modules, classes / interfaces, and functions. For configuration files, For the runtime environment. Define the edge set. },in These are weight coefficients assigned based on the relation type; for the calling relation, = Regarding inheritance, = Regarding configuration mapping relationships, = And satisfy .
[0014] By integrating weighted edges and nodes, a multi-granularity code knowledge graph can be constructed.
[0015] Step S2: Based on the functional requirements of the target task, locate the core nodes in the multi-granularity code knowledge graph, extract candidate code units using the semantic similarity algorithm, trace the source dependencies upward and expand the implementation downward based on the topological association mapping, and construct a modular context knowledge subgraph that is strongly related to the target task.
[0016] Calculate the semantic similarity between the task description and the graph nodes, and select the Top-K nodes as initial seed nodes. ;by For center path search, calculate the subgraph coverage function. : , in, For nodes The set of incoming edges, The weight of this edge represents the degree to which the node is depended on in the project topology.
[0017] in accordance with As a result, a complete subgraph containing the core logic and its necessary context entities is extracted.
[0018] Step S3: Optimize the modular context knowledge subgraph using the attention flow evaluation model. Quantify the code unit score by calculating node in-degree, topological distance, and semantic weight. Prioritize the code unit based on the score to achieve efficient compression of code representation.
[0019] Calculate the attention score of each neighbor node using the graph topology. : , in, for The set of neighboring nodes, For edge weights, This is the attenuation factor.
[0020] Preset attention threshold and Perform characterization at different granularities: like This is mapped to the complete source code text. like < < This is mapped to function names and class declaration text; like < This is mapped to semantic summary text; Serialize text fragments of different granularities into a context instruction stream.
[0021] Step S4: Map the code representation to the prompt word space to form a prompt word sequence containing structured constraints and type declarations. Input the sequence into the large model for intelligent code generation and verify the results. If a dependency is missing or a logical conflict is detected, it is fed back to step S2 for iterative optimization of the knowledge subgraph.
[0022] The contextual instruction stream is encapsulated into prompt words and input into a large model for intelligent code generation. ;extract External reference symbols in generated code and the set of subgraph nodes Perform comparison and calculate symbol hit rate .like If a logical conflict is detected, identify the graph node pointed to by the missing symbol, and re-execute steps S2 to S4 until the generated code passes the verification.
[0023] Example 2
[0024] This embodiment provides a code generation method based on multi-granularity code knowledge graph context engineering optimization, taking order refunds in a Spring Boot e-commerce system as an example. The method includes:
[0025] Step S1: Extract multi-dimensional semantic features from the source software project through static analysis, and construct a multi-granularity code knowledge graph covering projects, modules, classes / interfaces, and functions; parse cross-file call relationships, inheritance relationships, and type references, and establish topological association mappings between entities of different granularities.
[0026] Step S11. Define the node set. It includes the project node EcommerceSystem, package nodes such as com.eshop.controller and com.eshop.service, class nodes such as OrderController, OrderService, and OrderServiceImpl, method nodes such as processRefund(), createRefund(), and restoreStock(), and entity nodes such as Order, OrderItem, and RefundRecord. Includes application.yml, redis-config.properties, etc. Includes JDK-17, SpringBoot-2.7.0, MySQL-8.0, etc.
[0027] Step S12. Define the edge set }, Call Relationship =0.75, inheritance relationship =0.95, configuration relationship =0.45, specific relationship edge examples: OrderServiceImpl→ OrderService (0.95), OrderController.refundOrder()→ OrderService.processRefund() (0.75), OrderService → transaction-config.yml (0.45).
[0028] Step S13. Construct a knowledge graph and store it using the Neo4j graph database. A partial structure is shown in the figure:
[0029] OrderController.refundOrder() ↓ CALLS (0.75)
[0030] OrderService.processRefund()
[0031] → RefundService.createRefund()
[0032] → ProductService.restoreStock()
[0033] → OrderRepository.updateStatus()
[0034] → NotificationService.sendMsg()
[0035] Step S2. Based on the requirements of the target task, locate the core nodes in the multi-granularity code knowledge graph, extract candidate code units using the semantic similarity algorithm, trace the source dependencies upward and expand the implementation downward based on the topological association mapping, and construct a modular context knowledge subgraph that is strongly related to the target task.
[0036] Step S21. Calculate the semantic similarity between the task description and the graph nodes, and select the top-5 nodes as initial seed nodes. ;OrderService.processRefund(), RefundService.createRefund(), OrderController.refundOrder(), ProductService.restoreStock(), OrderRepository.updateStatus().
[0037] Step S22. Calculate the subgraph coverage. The initial subgraph (5 seed nodes) = 0.024, first-level extension = 0.063 (adding Order entities, etc.), second-level expansion = 0.110 (Add Product entity, etc.), third-level extension = 0.144 > 0.12 (threshold).
[0038] Step S23. Extract the complete subgraph, including core method nodes (refund, inventory, etc.), entity class nodes (Order, RefundRecord), configuration nodes (transaction configuration, Redis configuration, etc.), and environment nodes (SpringBoot).
[0039] Step S3. Optimize the modular context knowledge subgraph using the attention flow evaluation model. Quantify the code unit score by calculating the node in-degree, topological distance, and semantic weight. Prioritize the code unit based on the score to achieve efficient compression of code representation.
[0040] Step S31. Calculate the attention score of each neighboring node using the graph topology. Source node OrderService.processRefund(), =1, =0.8, maximum propagation depth 3 layers.
[0041] The attention distribution results are shown below:
[0042] OrderService.processRefund() 1
[0043] RefundService.createRefund() 0.2
[0044] ProductService.restoreStock() 0.2
[0045] OrderRepository.updateStatus() 0.14
[0046] OrderController.refundOrder() 0.12
[0047] RefundRecord entity 0.048
[0048] Redis configuration 0.012
[0049] Step S32. Set the threshold =0.15 and =0.05, mapping code at different granularities, for example >0.15 Mapping Complete Code: Refund Result processRefund(Long orderId, BigDecimal amount, String reasong); For <0.15, simplified mapping code: `boolean restoreStock(longproductId, Integer quantity);` For <0.05, mapping semantic summary, Redis configuration user distributed lock, transaction configuration, etc.
[0050] Step S33. Generate a structured instruction stream.
[0051] ===== Core Business Logic =====
[0052] [Complete Code] OrderService.processRefund()
[0053] [Complete Code] RefundService.createRefund()
[0054] ===== Supported Functions =====
[0055] [Simplified code] OrderController.refundOrder()
[0056] [Simplified code] ProductService.restoreStock()
[0057] ===== Basic Components =====
[0058] [Semantic Summary] Transaction configuration: Timeout 30 seconds
[0059] [Semantic Summary] Cache Configuration: Redis Distributed Lock
[0060] Step S4. Encapsulate the context instruction stream into prompt words and input them into a large model for intelligent code generation. .
[0061] Step S41. Input prompt words into the large model (GPT-4) to generate the refund function implementation code, part of which is as follows:
[0062] public class OrderServiceImpl implements OrderService {
[0063] private RefundService refundService;
[0064] public RefundResult processRefund(Long orderId, BigDecimalamount, String reason) {
[0065] Order order = orderRepository.findById(orderId).orElseThrow(() ->new BusinessException("Order does not exist"));
[0066] RefundRecord record = refundService.createRefund(orderId, amount,reason);
[0067] productService.restoreStock(productId, quantity);
[0068] order.setStatus(OrderStatus.REFUNDED);
[0069] orderRepository.save(order);
[0070] return new RefundResult(true, "Refund successful", record.getId());
[0071] }
[0072] }
[0073] Step S42. Calculate the symbol hit rate. A total of 25 code symbols were generated, with 18 symbols appearing in the subgraph, resulting in a hit rate of... 18 / 25 = 0.72.
[0074] Step S43. If the hit rate is greater than the threshold, code generation ends; if the hit rate is less than the threshold, analyze the missing symbols, recalculate the subgraph coverage, and generate the instruction stream.
[0075] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.
Claims
1. A code generation method based on multi-granularity code knowledge graph context engineering optimization, characterized in that, Includes the following steps: Step S1: Extract multi-dimensional semantic features from the source software project through static analysis, and construct a multi-granularity code knowledge graph covering projects, modules, classes / interfaces, and functions; parse cross-file call relationships, inheritance relationships, and type references, and establish topological association mappings between entities of different granularities; Step S2: Based on the functional requirements of the target task, locate the core nodes in the multi-granularity code knowledge graph, extract candidate code units using the semantic similarity algorithm, trace the source dependencies upward and expand the implementation downward based on the topological association mapping, and construct a modular context knowledge subgraph that is strongly related to the target task. Step S3: Optimize the modular context knowledge subgraph using the attention flow evaluation model. Quantify the code unit score by calculating node in-degree, topological distance, and semantic weight. Prioritize the code unit based on the score to achieve efficient compression of code representation. Step S4: Map the code representation to the prompt word space to form a prompt word sequence containing structured constraints and type declarations. Input the sequence into the large model for intelligent code generation and verify the results. If a dependency is missing or a logical conflict is detected, it is fed back to step S2 for iterative optimization of the knowledge subgraph.
2. The code generation method based on multi-granularity code knowledge graph context engineering optimization according to claim 1, characterized in that, The content of step S1 includes: Step S11. Define the node set ,in, It covers the hierarchical code structure of projects, modules, classes / interfaces, and functions. For configuration files, For the operating environment; Step S12. Define the edge set },in, These are weight coefficients assigned based on the relation type; for the calling relation, = Regarding inheritance, = Regarding the configuration mapping relationship, = And satisfy ; Step S13. Integrate weighted edges and nodes to construct a multi-granularity code knowledge graph.
3. The code generation method based on multi-granularity code knowledge graph context engineering optimization according to claim 1, characterized in that, The content of step S2 includes: Step S21. Calculate the semantic similarity between the task description and the graph nodes, and select the Top-K nodes as initial seed nodes. ; Step S22. For center path search, calculate the subgraph coverage function. : , in, For nodes The set of incoming edges, The weight of this edge is used to characterize the strength of the node's dependence in the project topology; Step S23. Based on As a result, a complete subgraph containing the core logic and its necessary context entities is extracted.
4. The code generation method based on multi-granularity code knowledge graph context engineering optimization according to claim 1, characterized in that, The content of step S3 includes: Step S31. Calculate the attention score of each neighboring node using the graph topology. : , in, for The set of neighboring nodes, For edge weights, It is the attenuation factor; Step S32. Preset attention threshold and Perform characterization at different granularities: like This is mapped to the complete source code text. like < < This is mapped to function names and class declaration text; like < This is mapped to semantic summary text; Step S33. Serialize text fragments of different granularities into a context instruction stream.
5. The code generation method based on multi-granularity code knowledge graph context engineering optimization according to claim 1, characterized in that, The content of step S4 includes: Step S41. Encapsulate the context instruction stream into prompt words and input them into a large model for intelligent code generation. ; Step S42. Extract External reference symbols in generated code and the set of subgraph nodes Perform comparison and calculate symbol hit rate ; Step S43. If If a logical conflict is detected, identify the graph node pointed to by the missing symbol, and re-execute steps S2 to S41 until the generated code passes the verification.