Business code generation method and project source code conversion model

By classifying and differentiating key information from the project source code and integrating it into a structured context package, which is then sent to a large language model, the problem of low code generation efficiency of large language models in enterprise-level Java projects is solved, achieving more efficient business code generation.

CN122240077APending Publication Date: 2026-06-19CHINA CONSTRUCTION BANK +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA CONSTRUCTION BANK
Filing Date
2026-03-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

When large language models generate enterprise-level Java project code, they struggle to acquire and understand all relevant code, resulting in generated business code that is out of touch with the actual project situation, consuming valuable context resources and introducing error messages.

Method used

The project source code is categorized into open-source standard code, custom core code, and custom non-core code. Key information, such as API signatures, source code of custom core code, and comments and deep summaries of custom non-core code, is obtained for each type. This information is then integrated into a structured context package and sent to a large language model.

Benefits of technology

It improves the efficiency of business code generation, solves the problem that large language models have difficulty obtaining accurate class code functions, avoids the waste of context resources, and the generated code is more consistent with the actual logic of the project.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240077A_ABST
    Figure CN122240077A_ABST
Patent Text Reader

Abstract

This application provides a business code generation method and a project source code conversion model, including: classifying the project source code to determine the code type of code segments in the project source code; wherein, the code type includes: open source standard code, custom core code, and custom non-core code; obtaining the API signature of the open source standard code and the first source code of the custom core code; obtaining the comment information of the custom non-core code, the second source code of the first code, and the deep digest of the second code; integrating the API signature, the first source code, the comment information, the second source code, and the deep digest into a structured context package; and sending the structured context package to a large language model so that the large language model generates business code based on the structured context package. This at least solves the problem of low efficiency in generating business code due to the difficulty in obtaining accurate class code functionality in related technologies, and achieves the effect of improving the efficiency of business code generation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the technical field of intelligent business code generation, specifically to a business code generation method and a project source code conversion model. Background Technology

[0002] With the widespread application of large language models in business code generation tasks, how to effectively provide appropriate contextual information to the Large Language Model (LLM) to overcome the inherent context window limitations of the LLM and improve the LLM's ability to understand specific project code (especially custom code lacking good documentation) has become a key challenge in improving the quality of LLM-based engineering code generation.

[0003] When dealing with large and complex enterprise Java projects, LLM (Local Level Manager) technologies often struggle to obtain and understand all relevant code (e.g., LLM inputs table structures and prompts), leading to generated business logic code that is disconnected from the actual project requirements. This is particularly problematic for the numerous non-core but functionally specific custom methods in a project. Without clear comments, LLM struggles to accurately infer functionality from method signatures alone, while providing the complete source code can consume valuable context resources and introduce unpredictable errors. Summary of the Invention

[0004] This application provides a business code generation method and a project source code conversion model, which at least solves the problem of low efficiency in generating business code due to the difficulty in obtaining accurate class code in related technologies.

[0005] According to one embodiment of this application, a business code generation method is provided, comprising: classifying project source code to determine the code type of code segments in the project source code; wherein, the code type includes: open source standard code, custom core code, and custom non-core code; obtaining the API signature of the open source standard code and the first source code of the custom core code; obtaining the comment information of the custom non-core code, the second source code of the first code, and the deep digest of the second code; integrating the API signature, the first source code, the comment information, the second source code, and the deep digest into a structured context package; and sending the structured context package to a large language model so that the large language model generates business code based on the structured context package.

[0006] According to another embodiment of this application, a project source code transformation model is provided, comprising: a code classification module for classifying project source code to determine the code type of code segments in the project source code; wherein the code type includes: open source standard code, custom core code, and custom non-core code; a first processing module for obtaining the API signature of the open source standard code and the first source code of the custom core code; a second processing module for obtaining the comment information of the custom non-core code, the second source code of the first code, and a deep digest of the second code; a context integration module for integrating the API signature, the first source code, the comment information, the second source code, and the deep digest into a structured context package; and a large language model interface module for sending the structured context package to a large language model so that the large language model generates business code based on the structured context package.

[0007] According to yet another embodiment of this application, a computer-readable storage medium is also provided, wherein a computer program is stored therein, and the computer program is configured to perform the steps in any of the above method embodiments when it is run.

[0008] According to yet another embodiment of this application, an electronic device is also provided, including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.

[0009] According to yet another embodiment of this application, a computer program product is also provided, including computer instructions that, when executed by a processor, implement the steps in any of the above method embodiments.

[0010] In one embodiment of this application, the project source code is first classified into three categories: open-source standard code, custom core code, and custom non-core code. Then, key information is obtained differently for each type of code, namely, the API signature of the open-source standard code, the first source code of the custom core code, the comment information of the custom non-core code, the corresponding source code, and the deep summary. Subsequently, this information is integrated into a structured context package, and this structured context package replaces the table structure and prompt words of the related technologies and is sent to the large language model. This structured context package is obtained based on the project source code and consumes fewer tokens from the large language model than the complete source code, while being more interpretable. It solves the core problem that the large language model has difficulty obtaining accurate class code functions and avoids the problem of wasting context resources. Thus, it effectively solves the problem of low efficiency in business code generation caused by the difficulty in obtaining accurate class code functions in related technologies, and achieves the effect of improving the efficiency of business code generation. Attached Figure Description

[0011] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:

[0012] Figure 1 This is a hardware structure block diagram of the business code generation method according to an embodiment of this application;

[0013] Figure 2 This is a flowchart of a business code generation method according to an embodiment of this application;

[0014] Figure 3 This is a flowchart of a method for obtaining the second source code and a deep summary of the second code from the first code in the second custom non-core code, according to an embodiment of this application.

[0015] Figure 4 This is a flowchart of a method for determining the first code and the second code based on the understanding of the second custom non-core code using a large language model, according to an embodiment of this application.

[0016] Figure 5 This is a flowchart illustrating another method for determining the first and second codes based on the understanding of the second custom non-core code using a large language model, according to an embodiment of this application.

[0017] Figure 6 This is a flowchart illustrating the process of obtaining a deep summary when the level of understanding is a second level of understanding, according to an embodiment of this application.

[0018] Figure 7 This is a structural block diagram of the project source code conversion model according to an embodiment of this application;

[0019] Figure 8 This is a structural block diagram of the second processing module according to an embodiment of this application. Detailed Implementation

[0020] The present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in the embodiments of the present application can be combined with each other.

[0021] It should be noted that the terms "first," "second," etc., in the specification, claims, and drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0022] The collection, storage, use, processing, transmission, provision, and disclosure of financial data or user data involved in the technical solution of this application all comply with the provisions of relevant laws and regulations and do not violate public order and good morals.

[0023] It should be noted that in the embodiments of this application, certain software, components, models and other existing solutions in the industry may be mentioned. These should be regarded as exemplary and are only intended to illustrate the feasibility of implementing the technical solution of this application. However, it does not mean that the applicant has used or necessarily used the solution.

[0024] The methods and embodiments provided in this application can be executed on a mobile terminal, a computer terminal, or a similar computing device. Taking running on a computer terminal as an example, Figure 1 This is a hardware structure block diagram of the business code generation method according to an embodiment of this application, such as... Figure 1 As shown, a hardware board may include one or more ( Figure 1 Only one is shown in the diagram. A processor 12 (which may include, but is not limited to, a microprocessor MCU or programmable logic device, etc.) and a memory 14 for storing data are also shown. The computer terminal may further include a transmission device 16 for communication functions and an input / output device 18. Those skilled in the art will understand that... Figure 1 The structure shown is for illustrative purposes only and does not limit the structure of the computer terminal described above. For example, the computer terminal may also include components that are more complex than those described above. Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown.

[0025] The memory 14 can be used to store computer programs, such as application software programs and modules, like the computer program corresponding to the business code generation method in this embodiment. The processor 12 executes various functional applications and implements the above-described methods by running the computer programs stored in the memory 14. The memory 14 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 14 may further include memory remotely located relative to the processor 12, and these remote memories can be connected to a computer terminal via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0026] The transmission device 16 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by a telecommunications provider. In one example, the transmission device 16 includes a Network Interface Controller (NIC), which can connect to other network devices via a gateway to communicate with the Internet. In another example, the transmission device 16 may be a Radio Frequency (RF) module used for wireless communication with the Internet.

[0027] This application provides a method for generating business code. Figure 2 This is a flowchart of a business code generation method according to an embodiment of this application, such as... Figure 2 As shown, the process includes:

[0028] Step S201: Classify the project source code to determine the code type of the code segments in the project source code; the code type includes: open source standard code, custom core code, and custom non-core code;

[0029] In one implementation, classifying the project source code to determine the code type of code segments within the project source code includes: determining the code type in the project source code based on file indexes, wherein the project source code includes multiple code segments stored in different files.

[0030] In one exemplary implementation, the identification of open-source standard code involves three dimensions of analysis to accurately locate it: First, parsing the import statements in the project to identify the imported external library class paths; second, reading dependency management files (such as Maven's pom.xml and Gradle's build scripts) to extract declared third-party dependency information; and third, comparing the code segments with a predefined list of well-known open-source libraries (including Apache Commons, Guava, Spring Framework core components, etc.) and matching class paths and dependency identifiers to ultimately determine the code segments belonging to open-source standard libraries.

[0031] Specifically, the process involves collecting global project information: traversing all source code files and extracting all import statements; reading dependency management files (Maven / Gradle) in the project root directory and parsing the declared dependency coordinates (such as groupId, artifactId, and version); establishing a matching rule base: predefining characteristic information of well-known open-source libraries, including library name, core package path (such as org.springframework for Spring, com.google.common for Guava), and dependency coordinate templates; and performing matching judgment: comparing the collected import statement package paths and dependency coordinates with the rule base. If the import source or dependency affiliation of a code segment matches the characteristics of an open-source library, then the code segment is determined to be "open-source standard code".

[0032] Regarding the identification of custom core code: If any of the following conditions are met, it can be identified as custom core code: 1. It implements the main business logic (such as core processes such as order creation and payment settlement), is frequently called by other modules in the project, is located in the critical execution path of the business, and has specific framework annotations (such as the @Service service layer annotation and @Repository data access layer annotation in Spring); 2. It can be determined through static analysis techniques (such as calculating the call graph centrality and cyclomatic complexity); 3. The determination can be completed by the developer's preset business rules (such as specifying the core business package path).

[0033] Specifically, business rule verification: checks whether the code belongs to the developer's pre-defined core business package (e.g., com.company.project.biz.core) and whether it implements core business interfaces (e.g., order processing, user authentication related interfaces). Annotation recognition: detects whether classes or methods have core framework annotations (e.g., Spring's @Service, @Repository, @Controller, or custom @CoreModule annotations). Static technical analysis: calculates call graph centrality (call frequency, call chain criticality) and cyclomatic complexity (core business logic is usually more complex) using code analysis tools. If the indicators exceed preset thresholds (e.g., top 30% of call frequency, cyclomatic complexity ≥ 10), it is determined to be core code.

[0034] Regarding the identification of custom non-core code: For custom code that does not meet the above criteria for core code, further confirmation is made in conjunction with the functional scenario: if the code function is auxiliary (such as data format conversion, file upload and download tools, log encapsulation) and is only used in specific edge scenarios (such as temporary data export, test auxiliary methods), then it is finally determined to be "custom non-core code".

[0035] Step S202: Obtain the API signature of the open-source standard code and the first source code of the custom core code;

[0036] In one exemplary implementation, regarding obtaining the API signature of open-source standard code: First, based on the code classification results, the class or method corresponding to the open-source standard code is identified. Then, key identifying information is extracted by parsing the code's syntax structure (e.g., using a JavaParser tool). The API signature must include core elements such as the fully qualified class name, method name, parameter type list, and return type to ensure it uniquely identifies the functional interface of the open-source API. For example, if the project introduces the Apache Commons StringUtils class, its open-source standard code API signature can be extracted as: "org.apache.commons.lang3.StringUtils.isEmpty (String str):boolean", which clearly identifies the fully qualified path of the class, the method name, and also marks the parameter types and return type, allowing large language models to quickly identify the purpose and calling method of the API.

[0037] Regarding obtaining the first source code of custom core code: For classes or methods identified as custom core code, their complete source code is extracted directly, including all key code details such as class annotations, member variable definitions, method implementations, and exception handling logic. For example, the OrderService class with the @Service annotation in the project is custom core code. Its first source code will completely extract the class definition (including the @Service annotation), parameter validation of the create Order method, database interaction, transaction control, and all other implementation code. This ensures that the large language model can accurately grasp the implementation details of the project's core business logic, providing a complete basis for generating business code that fits the actual project.

[0038] Step S203: Obtain the comment information of the custom non-core code, the second source code of the first code, and the deep summary of the second code;

[0039] In one exemplary implementation, for example, the comments of custom non-core code are checked. If they exist, the main description, parameter description, and return value description are extracted. If they do not exist, a large language model (LLM) interaction (e.g., sending method signatures and key fragments + prompts to obtain a response) or static complexity metric calculation is used to distinguish between easily understandable first code and difficult-to-understand second code. The complete source code of the first code and the key statement sequence and data transformation path of the second code are then obtained to generate a deep summary. Therefore, effective information about non-core code is provided on demand. Documentation is directly reused when comments are available, and differentiated processing based on comprehension difficulty is used when no comments are available. This ensures that the large language model can understand the functionality of non-core code while avoiding the waste of context resources from complete source code.

[0040] Step S204: Integrate the API signature, first source code, comment information, second source code, and deep digest into a structured context package;

[0041] In one exemplary implementation, for example, the API signature and the first source code of the custom core code obtained in step S202 are integrated with the annotation information, the second source code, and the deep summary obtained in step S203 into a structured context package in a unified format, ensuring clear information classification and logical coherence. Therefore, by forming organized input content from scattered key information, large language models do not need to filter out invalid information, reducing understanding costs. At the same time, the structured format consumes fewer tokens than the complete source code, solving the problem of low utilization of context resources.

[0042] Step S205: The structured context package is sent to the large language model so that the large language model can generate business code based on the structured context package.

[0043] In one exemplary implementation, for example, a structured context package is used to replace the table structure and prompt words in related technologies, so that large language models can quickly grasp various key information of project code based on accurate, regular and token-saving context. The generated business code has stronger integration with the actual logic of the project and existing code, which solves the core problem of business code generation being disconnected from the project and inefficient in related technologies.

[0044] Through steps S201 to S205, the project source code is first classified into three categories: open-source standard code, custom core code, and custom non-core code. Then, key information is obtained differently for each type of code, namely, the API signature of the open-source standard code, the first source code of the custom core code, the comment information of the custom non-core code, the corresponding source code, and the deep summary. Subsequently, this information is integrated into a structured context package, and this structured context package replaces the table structure and prompt words of the relevant technologies and is sent to the large language model. This structured context package is obtained based on the project source code and consumes fewer tokens from the large language model than the complete source code, while being more interpretable. It solves the core problem that the large language model has difficulty obtaining accurate class code functions and avoids the problem of wasting context resources. In this way, it effectively solves the problem of low efficiency in business code generation caused by the difficulty in obtaining accurate class code functions in the relevant technologies, and achieves the effect of improving the efficiency of business code generation.

[0045] In one implementation, obtaining the comment information of custom non-core code, the second source code of the first code, and the deep summary of the second code includes: checking the code comments in the custom non-core code;

[0046] Specifically, if code comments exist in the custom non-core code, the custom non-core code is identified as the first custom non-core code, and the comment information of the first custom non-core code is obtained. The comment information includes: main description, parameter description, and return value description. Alternatively, if code comments do not exist in the custom non-core code, the custom non-core code is identified as the second custom non-core code, and the second source code and a deep summary of the second code in the second custom non-core code are obtained.

[0047] In one exemplary implementation, for example, using Javadoc comments, the system can check whether Javadoc comments exist in the custom non-core code. Specifically, if code comments exist in the custom non-core code, it is determined to be the first custom non-core code, and the comment information of the first custom non-core code is obtained. The comment information can include a main description, parameter description, return value description, etc. If no code comments exist in the custom non-core code, it is determined to be the second custom non-core code, and the second source code and a deep summary of the second code within the second custom non-core code are obtained.

[0048] Figure 3 This is a flowchart illustrating a method for obtaining the second source code and a deep summary of the second code from the first code within the second custom non-core code, according to an embodiment of this application. Figure 3 As shown, in one implementation, obtaining the second source code and a deep summary of the second code from the first code within the second custom non-core code includes:

[0049] Step S301: Determine the first code and the second code based on the understanding of the second custom non-core code by the large language model;

[0050] In one exemplary implementation, for example, the degree to which a large language model understands a second custom non-core code can be determined in two ways to distinguish between the first and second code. Method 1: The method signature of the code (e.g., "public static Map...")<String,Object> formatData (List <string>The code snippets (such as rawData) and key code fragments (e.g., data initialization, core call statements) are sent to a large language model, accompanied by the prompt "Please summarize the core function of this method; if you cannot be clear, please explain." The level of understanding is determined based on the accuracy of the response and the presence of ambiguity. Alternatively, the static complexity metrics (such as cyclomatic complexity and nesting depth) of the code are calculated. If the cyclomatic complexity is less than 8 (a preset threshold), it is considered the first level of understanding; otherwise, it is considered the second level of understanding, thus distinguishing between the first and second levels of code. Therefore, this method can accurately identify the differences in the large language model's understanding of different non-core code, avoiding information redundancy or insufficiency caused by uniform processing, and laying the foundation for providing targeted information subsequently.

[0051] Step S302: If the level of understanding is the first level of understanding, obtain the second source code;

[0052] In one exemplary implementation, for example, when the level of understanding is determined to be a first level of understanding (the large language model struggles to accurately understand the code's functionality), the second source code (i.e., the complete source code of the first code) is obtained. For instance, if a second custom non-core code is a simple string concatenation helper method, and the large language model replies "may be used for string processing," the vague description is judged as a first level of understanding, and the complete source code of that method (including parameter validation, concatenation logic, return results, etc.) is directly extracted. Therefore, providing the large language model with complete code implementation details compensates for its insufficient understanding, ensuring that the large language model can accurately grasp the functional logic of the code and avoids affecting business code generation due to misunderstanding biases.

[0053] Step S303: Obtain a deep summary when the level of understanding is the second level of understanding value; wherein the first level of understanding value is lower than the second level of understanding value.

[0054] In one exemplary implementation, for example, when the level of understanding is determined to be the second level of understanding (a large language model can better grasp the core logic of the code), the third source code (complete source code) of the second code is first obtained, and then the key statement sequence (such as "reading the configuration file to verifying the validity of parameters to calling the data conversion interface to returning the formatted result") and the main data conversion path (such as "List") are extracted using a parsing tool. <string>To JSON object to Map<String, Object> Based on this structured information, a deep summary is generated (e.g., "The formatData method receives a list of strings rawData, validates the parameters, calls the convert interface to format the data into a key-value pair Map and returns it"). Therefore, replacing lengthy source code with concise deep summaries significantly reduces the context token consumption of large language models, while allowing large language models to quickly grasp the core functionality of the code, avoid interference from invalid code details, and improve the efficiency of context utilization.

[0055] Through steps S301 to S303, the understanding boundaries of the large language model for the second custom non-core code are first accurately identified. Then, complete source code is provided for code that is difficult to understand, and in-depth summaries are provided for code that is easy to understand. This ensures that the large language model can accurately grasp the functional logic of the non-core code, while saving context resources to the greatest extent. It effectively solves the problems in related technologies where either providing complete source code wastes tokens or insufficient information leads to misunderstandings in the large language model. This provides strong support for the subsequent construction of structured context packages and the generation of high-quality business code.

[0056] Figure 4 This is a flowchart illustrating a method for determining the first code and the second code based on the understanding of the second custom non-core code using a large language model, according to an embodiment of this application. Figure 4 As shown, in one implementation, the first code and the second code are determined based on the understanding of the second custom non-core code by a large language model, including:

[0057] Step S401: Send the method signature and key code snippets of the second custom non-core code to the large language model;

[0058] In one exemplary implementation, for example, key identifiers and core fragments are extracted from the second custom non-core code, first obtaining the method signature (such as "public static List"). <string>filterInvalidData(List <string>dataList,Set <string>The code snippet `validKeys` is then used to extract key code segments (such as key statements in parameter validation logic and core data filtering logic, rather than the complete method body). These snippets are then packaged together and sent to the target large language model (such as GPT-4 or Llama3) via a large language model interface. Therefore, only the core information needed for the large language model's judgment is transmitted, avoiding the use of too many context tokens from the complete source code. Simultaneously, it ensures that the large language model can initially perceive the code's functionality based on the method signature and key segments, providing sufficient basis for subsequent understanding assessments.

[0059] Step S402: Obtain the response content of the large language model based on the prompt words;

[0060] In one exemplary implementation, for example, targeted prompts are designed to guide the large language model in providing feedback on its understanding. An example prompt might be, "Based on the provided method signature and code snippet, please summarize the core function of the method in one sentence; if you cannot clearly determine the function, have doubts, or lack sufficient information, please directly state 'I cannot accurately understand'." This prompt is sent along with the packaging information from step S401. The response from the large language model is received and recorded (e.g., "This method receives a data list and a set of valid keys, filters out data containing valid keys, and returns it" or "I cannot accurately understand"). Therefore, by using explicit prompts to standardize the response format and content of the large language model, overly generalized responses or deviations from the judgment requirements are avoided, ensuring that the level of understanding of the large language model can be directly determined based on the response.

[0061] Step S403: Determine the first code and the second code in the second custom non-core code based on the response content.

[0062] In one exemplary implementation, for example, a response judgment rule can be set. If the response from the large language model accurately summarizes the core function of the method and is unambiguous (such as the response "filtering data containing valid keys"), then the large language model's understanding level is determined to be the second understanding value, and the corresponding code segment is the second code. If the response is "cannot be accurately understood" or is ambiguous and deviates significantly from the actual function of the code (such as mistakenly stating "sorting data" instead of "filtering invalid data"), then the understanding level is determined to be the first understanding value, and the corresponding code segment is the first code. Therefore, quickly and accurately distinguishing between code segments that the large language model can effectively understand and those that are difficult to understand provides a clear basis for differentiated processing in subsequent steps such as "providing complete source code" or "generating deep summaries," avoiding resource waste or misunderstanding caused by indiscriminate processing.

[0063] Through steps S401 to S403, context resources are saved by transmitting key information instead of complete source code. Targeted prompts guide the large language model to provide effective responses, and comprehension level is determined based on clear rules. The entire process ensures the accuracy of the large language model's comprehension level judgment while avoiding unnecessary information transmission. It efficiently distinguishes between the first and second code, providing a reliable prerequisite for the subsequent extraction of differentiated information from the second custom non-core code. This further supports the optimized construction of structured context packages, helping to improve the efficiency and quality of business code generation.

[0064] Figure 5 This is a flowchart illustrating another method for determining the first and second codes based on the understanding of the second custom non-core code using a large language model, according to embodiments of this application. Figure 5 As shown, in one implementation, the first code and the second code are determined based on the understanding of the second custom non-core code by a large language model, including:

[0065] Step S501: Calculate the static complexity index of the second custom non-core code;

[0066] In one exemplary implementation, for example, static code analysis tools (such as JavaParser or CheckStyle) are used to scan the second custom non-core code and calculate key static complexity metrics, including cyclomatic complexity (the number of conditional statements, loops, and other branches in the code), nesting depth (such as the nesting levels of if-else statements and for loops), and lines of code (the number of effective lines of code in the method body). For example, if a method body of a second custom non-core code contains two if statements and one for loop, its cyclomatic complexity is calculated to be 3; its nesting depth is 2 levels; and its effective lines of code are 25. These data constitute the static complexity metrics of the code. Therefore, by using objectively quantified metrics to reflect the complexity of the code, the bias of subjective judgment is avoided, providing an accurate and quantifiable basis for subsequently determining the understanding level of large language models.

[0067] Step S502: If the static complexity index is less than a preset threshold, determine the level of understanding as the first level of understanding value, and determine the code segment corresponding to the first level of understanding value as the first code;

[0068] In one exemplary implementation, for example, preset static complexity thresholds are set, such as a cyclomatic complexity threshold of 8, a nesting depth threshold of 3 levels, and a code line count threshold of 50 lines. When all static complexity indicators of the second custom non-core code are less than the preset thresholds (e.g., cyclomatic complexity: 3 < 8, nesting depth: 2 < 3, code line count: 25 < 50), the large language model is determined to have a first comprehensibility value (easy to understand), and the corresponding code segment is designated as the first code. Therefore, code with simple structure and clear logic is quickly selected, demonstrating that the large language model can understand this type of code without additional assistance, laying the foundation for directly providing the source code and avoiding unnecessary deep processing.

[0069] Step S503: If the static complexity index is greater than or equal to a preset threshold, determine the level of understanding as the second level of understanding value, and determine the code segment corresponding to the second level of understanding value as the second code.

[0070] In one exemplary implementation, for example, if any of the static complexity metrics of the second custom non-core code is greater than or equal to a preset threshold. For instance, if a code has a cyclomatic complexity of 10 ≥ 8, a nesting depth of 4 ≥ 3, or a line count of 60 ≥ 50, then the large language model is determined to have a second level of understanding of the code (difficult to understand), and the corresponding code segment is identified as the second code. Therefore, this approach accurately identifies logically complex and structurally intricate code, clarifying that such code requires deep summarization to aid the large language model's understanding, thus avoiding misunderstandings or wasted contextual resources caused by directly providing complex source code.

[0071] Through steps S501 to S503, classification can be completed quickly through static analysis without interacting with large language models, significantly improving processing efficiency. At the same time, the judgment results based on quantitative indicators are accurate and reliable, and can accurately distinguish between code segments that are easy to understand and difficult to understand in large language models. This provides a clear basis for providing source code or generating in-depth summaries for subsequent differentiation, effectively supporting the optimized construction of structured context packages and helping to improve the quality and efficiency of business code generation.

[0072] Figure 6 This is a flowchart illustrating the process of obtaining a deep summary when the level of understanding is a second level of understanding, according to an embodiment of this application. Figure 6 As shown, in one implementation, obtaining a deep summary when the level of understanding is a second level of understanding value includes:

[0073] Step S601: Obtain the third source code of the second code;

[0074] In one exemplary implementation, for example, the complete source code (i.e., the third source code) of the second code deemed "difficult to understand" by a large language model understanding assessment is located and extracted from the project codebase. The extraction process ensures that all key components of the custom non-core method are included, such as the method signature, parameter definitions, method body implementation, exception declarations, and related dependency references, without any missing code snippets. Therefore, complete and intact raw material is provided for subsequent structured information generation, avoiding deviations in core logic or data flow analysis due to incomplete source code, and laying a solid foundation for accurate subsequent parsing.

[0075] Step S602: Generate structured code information based on the third source code; wherein, the structured code information includes: key statement sequence and main data transformation path;

[0076] In one exemplary implementation, structured code information is generated through a specialized parsing process. For example, first, a professional Java parser such as JavaParser or Spoon is used to perform syntactic analysis on the third-party source code, constructing a corresponding Abstract Syntax Tree (AST). Then, based on the AST, key information is mined in depth: from the perspective of control flow, the core logical structure (such as loops and conditional branches), key API calls, and modification operations of member variables / parameters within the method body are identified, forming an ordered sequence of key statements; from the perspective of data flow, the transmission path, processing, intermediate result transformation relationship, and final return value generation logic of input parameters within the method are traced, clarifying the main data transformation paths. If necessary, a simplified control flow graph (CFG) can be constructed to assist in logical sorting. Finally, the code information is integrated to form a structured code information containing the sequence of key statements and the main data transformation paths. Therefore, complex and lengthy original source code can be transformed into concise, well-organized, and key-focused structured information, stripping away irrelevant and redundant code, focusing on core logic and data flow. This reduces the difficulty of subsequent deep summarization and allows large language models to quickly and accurately capture the core information of the code, avoiding interference from irrelevant details.

[0077] Step S603: Generate a deep summary based on the structured code information.

[0078] In one exemplary implementation, for example, using structured code information (key statement sequences, main data transformation paths, etc.) as input, predefined templates or rule-based natural language generation techniques are employed to focus on the core elements of the method for integrated description. The main functions of the method, the required input parameters and their respective uses, the output return values ​​and their significance are clearly defined, while key side effects (such as object state modification) are annotated. If necessary, the language expression can be optimized by combining a Transformer model fine-tuned by a code summarization task, making the description more natural and accurate. Therefore, transforming complex structured code information into concise, easy-to-understand, function-oriented natural language summaries not only removes irrelevant and redundant details but also fully preserves the core logic and key information. This allows large language models to quickly and accurately grasp the functions and logic of non-core code without parsing the original source code, significantly reducing context token consumption, improving context utilization efficiency, and providing strong support for subsequent structured context package construction and high-quality business code generation.

[0079] Through steps S601 to S603, the complete source code of the second code is first accurately extracted to ensure no missing original information. Then, the complex source code is transformed into structured information focusing on core logic and data flow. Finally, a concise and easy-to-understand function-oriented natural language summary is generated, achieving efficient "digestion" and "transformation" of non-core code that is difficult for large language models to understand. This method not only completely solves the problem of wasted context tokens caused by directly providing complete source code, but also makes up for the deficiency of large language models in accurately inferring the functions of complex code without comments. It allows large language models to quickly and accurately grasp core information without parsing redundant code, improving the efficiency of context utilization and providing high-quality input for the subsequent construction of structured context packages. This supports large language models in generating business code that is more in line with the actual project and has more rigorous logic, effectively overcoming the core pain points of low code generation efficiency and disconnect from the project in related technologies.

[0080] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by adding necessary general-purpose hardware platforms with the aid of software. Of course, they can also be implemented using hardware, but in many cases, the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0081] This application also provides a project source code conversion model for implementing the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that performs a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0082] Figure 7 This is a structural block diagram of the project source code conversion model according to an embodiment of this application, such as... Figure 7 As shown, the source code conversion model for this project includes:

[0083] The code classification module 71 is used to classify the project source code to determine the code type of the code segments in the project source code; the code types include: open source standard code, custom core code, and custom non-core code;

[0084] The first processing module 72 is used to obtain the API signature of the open source standard code and the first source code of the custom core code;

[0085] The second processing module 73 is used to obtain the comment information of the custom non-core code, the second source code of the first code, and the deep summary of the second code;

[0086] The context integration module 74 is used to integrate API signature, first source code, comment information, second source code, and deep digest into a structured context package;

[0087] The large language model interface module 75 is used to send the structured context package to the large language model so that the large language model can generate business code based on the structured context package.

[0088] By adopting the above technical solution, the project source code is classified through the code classification module 71, identifying three categories: open-source standard code, custom core code, and custom non-core code. This lays the foundation for accurately obtaining functional information of each code type. The first processing module 72 obtains the API signature of the open-source standard code and the first source code of the custom core code. The second processing module 73 obtains the comment information of the custom non-core code, the second source code of the first code, and a deep summary of the second code. Differentiated key information extraction strategies are implemented for different types of code, ensuring accurate acquisition of functional information for each type of code while avoiding the waste of contextual resources caused by directly providing the complete source code. The process avoids introducing costs and errors. The context integration module 74 integrates the various key information obtained above into a structured context package. Compared to scattered information or table structures and prompts in related technologies, this structured package is more interpretable and consumes fewer tokens from the large language model. The structured context package is then sent to the large language model through the large language model interface module 75, replacing the table structures and prompts in related technologies as input. This addresses the core problem of the large language model's difficulty in obtaining accurate class code functions from the input layer, avoids wasting context resources, and ultimately effectively solves the problem of low efficiency in business code generation, achieving the effect of improving business code generation efficiency.

[0089] In one implementation, the code classification module 71 is further configured to: determine the code type in the project source code based on the file index, wherein the project source code includes multiple code segments, which are stored in different files.

[0090] Figure 8 This is a structural block diagram of the second processing module according to an embodiment of this application, such as... Figure 8 As shown, in one embodiment, the second processing module 73 includes: a comment information extraction unit 731, used to check the code comments in the custom non-core code; wherein, if there are code comments in the custom non-core code, the custom non-core code is determined to be the first custom non-core code, and the comment information of the first custom non-core code is obtained, wherein the comment information includes: main description, parameter description, and return value description; if there are no code comments in the custom non-core code, the custom non-core code is determined to be the second custom non-core code, and the second source code and the deep summary of the second code in the second custom non-core code are obtained.

[0091] In one implementation, such as Figure 8 As shown, the second processing module 73 includes: a comprehension evaluation unit 732, used to determine the first code and the second code based on the comprehension level of the second custom non-core code using a large language model; wherein, when the comprehension level is a first comprehension value, the second source code is obtained; when the comprehension level is a second comprehension value, a deep summary is obtained; wherein, the first comprehension value is lower than the second comprehension value.

[0092] In one implementation, the comprehension assessment unit 732 is further configured to: send the method signature and key code snippets of the second custom non-core code to a large language model; obtain the response content of the large language model based on prompt words; and determine the first code and the second code in the second custom non-core code based on the response content.

[0093] In one embodiment, the comprehension evaluation unit 732 is further configured to: calculate the static complexity index of the second custom non-core code; and, if the static complexity index is less than a preset threshold, determine the comprehension level as a first comprehension value, and determine the code segment corresponding to the first comprehension value as the first code; or, if the static complexity index is greater than or equal to the preset threshold, determine the comprehension level as a second comprehension value, and determine the code segment corresponding to the second comprehension value as the second code.

[0094] In one implementation, such as Figure 8 As shown, the second processing module 73 includes:

[0095] The source code parsing unit 733 is used to obtain the third source code of the second code; and to generate structured code information based on the third source code; wherein, the structured code information includes: key statement sequence and main data transformation path;

[0096] The deep summary generation unit 734 is used to generate deep summaries based on structured code information.

[0097] It should be noted that the above modules can be implemented by software or hardware. For the latter, they can be implemented in the following ways, but are not limited to: all the above modules are located in the same processor; or, the above modules are located in different processors in any combination.

[0098] This application also provides a computer-readable storage medium storing a computer program configured to execute the steps in any of the above method embodiments when running.

[0099] In one exemplary embodiment, the aforementioned computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard disk, magnetic disk, or optical disk.

[0100] This application also provides an electronic device including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.

[0101] In one exemplary embodiment, the electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the processor and the input / output device is connected to the processor.

[0102] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps in any of the above method embodiments.

[0103] Specific examples in the embodiments of this application can be found in the examples described in the above embodiments and exemplary implementations, and will not be repeated here.

[0104] Obviously, those skilled in the art should understand that the modules or steps of this application described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. They can be implemented using computer-executable program code, and thus can be stored in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented here, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, this application is not limited to any particular combination of hardware and software.

[0105] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the principles of this application should be included within the protection scope of this application.< / string> < / string> < / string> < / string> < / string>

Claims

1. A method for generating business code, characterized in that, include: The project source code is categorized to determine the code type of the code segments within the project source code; wherein, the code type includes: open source standard code, custom core code, and custom non-core code; Obtain the API signature of the open-source standard code and the first source code of the custom core code; Obtain the comment information of the custom non-core code, the second source code of the first code, and the deep summary of the second code; The API signature, the first source code, the comment information, the second source code, and the deep digest are integrated into a structured context package; The structured context package is sent to a large language model so that the large language model generates business code based on the structured context package.

2. The method according to claim 1, characterized in that, Obtain the comment information of the custom non-core code, the second source code of the first code, and a deep summary of the second code, including: Inspect the code comments in the custom non-core code; If the code comments exist in the custom non-core code, the custom non-core code is determined to be the first custom non-core code, and the comment information of the first custom non-core code is obtained, wherein the comment information includes: main description, parameter description, and return value description; If the code comments are not present in the custom non-core code, the custom non-core code is determined to be the second custom non-core code, so as to obtain the second source code and the deep summary of the second code of the first code in the second custom non-core code.

3. The method according to claim 2, characterized in that, Obtain the second source code and a deep summary of the second code from the first code within the second custom non-core code, including: The first code and the second code are determined based on the degree of understanding of the second custom non-core code by the large language model. If the level of understanding is the first level of understanding, then obtain the second source code; When the level of understanding is a second level of understanding, the deep summary is obtained; wherein the first level of understanding is lower than the second level of understanding.

4. The method according to claim 3, characterized in that, Based on the understanding of the second custom non-core code by the large language model, the first code and the second code are determined, including: The method signature and key code snippets of the second custom non-core code are sent to the large language model; The response content of the large language model is obtained based on the prompt words; Based on the content of the reply, the first code and the second code in the second custom non-core code are determined.

5. The method according to claim 3, characterized in that, Based on the understanding of the second custom non-core code by the large language model, the first code and the second code are determined, including: Calculate the static complexity metric of the second custom non-core code; If the static complexity index is less than a preset threshold, the level of understanding is determined as a first level of understanding value, and the code segment corresponding to the first level of understanding value is determined as the first code. If the static complexity index is greater than or equal to a preset threshold, the level of understanding is determined as a second level of understanding value, and the code segment corresponding to the second level of understanding value is determined as the second code.

6. The method according to claim 3, characterized in that, When the level of understanding is the second level of understanding value, obtaining the deep summary includes: Obtain the third source code of the second code; Structured code information is generated based on the third source code; wherein, the structured code information includes: key statement sequences and main data transformation paths; A deep summary is generated based on the structured code information.

7. The method according to claim 1, characterized in that, The project source code is categorized to determine the code type of code segments within the project source code, including: The code type in the project source code is determined based on the file index, wherein the project source code includes multiple code segments, which are stored in different files.

8. A project source code conversion model, characterized in that, include: The code classification module is used to classify the project source code to determine the code type of the code segments in the project source code; wherein, the code type includes: open source standard code, custom core code, and custom non-core code; The first processing module is used to obtain the API signature of the open-source standard code and the first source code of the custom core code; The second processing module is used to obtain the comment information of the custom non-core code, the second source code of the first code, and the deep summary of the second code; The context integration module is used to integrate the API signature, the first source code, the annotation information, the second source code, and the deep digest into a structured context package; A large language model interface module is used to send the structured context package to the large language model so that the large language model can generate business code based on the structured context package.

9. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, wherein the computer program is executed by a processor to perform the method described in any one of claims 1 to 7.

10. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to run the computer program to perform the method of any one of claims 1 to 7.

11. A computer program product comprising computer instructions, characterized in that, When the computer instructions are executed by the processor, they implement the method of any one of claims 1 to 7.