Secure ai-assisted code generation system with iterative patching

A framework using AI agents in a secure sandboxed environment iteratively generates and tests code changes, addressing the inefficiencies of current generative models by ensuring functional multi-file patches are produced effectively.

WO2026128608A1PCT designated stage Publication Date: 2026-06-18GOOGLE LLC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
GOOGLE LLC
Filing Date
2025-12-10
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Current generative model code generation solutions struggle with generating functional and bug-free code snippets efficiently, particularly in large and complex codebases, and lack effective testing mechanisms.

Method used

A framework that orchestrates multiple AI agents within a secure sandboxed environment to collaboratively generate and test multiple candidate code changes and unit tests, using iterative refinement and snapshotting to ensure correctness, with an orchestrator managing the process.

🎯Benefits of technology

Ensures the generation of functional multi-file patches by efficiently testing and refining code changes, preventing corruption of the main codebase and ensuring the generated code meets functional requirements.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure US2025058999_18062026_PF_FP_ABST
    Figure US2025058999_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Implementations are described herein for generating and testing code changes in a codebase. In some implementations, one or more agents may identify one or more constituent source files in the codebase that are applicable to patch trigger data. One or more of the agents may generate a plurality of candidate source code changes using one or more generative models. The candidate source code changes may be based on the one or more constituent source code files. One or more of the agents may generate a plurality of unit tests using one or more of the generative models. One or more of the agents may run different permutations of the unit tests and candidate source code changes. Based on outcomes of the different permutations of the unit tests and candidate source code changes, one of the candidate source code changes may be selected and incorporated into the multi-file patch.
Need to check novelty before this filing date? Find Prior Art

Description

Attorney Docket No. GOOG-0685-WO-01SECURE Al- ASSISTED CODE GENERATION SYSTEM WITH ITERATIVE PATCHINGBackground

[0001] Automated code generation using machine learning promises to ease the burden on programmers while writing source code, freeing them from menial coding tasks so that they are able to focus on the more creative aspects of software engineering. The power output of automated code generation may be used for a variety of purposes, such as code change analysis, automated testing, and the integration of new functionality into existing code. Code generation tools can use machine learning, including generative models such as large language models (LLMs), to generate useful code from large codebases. Current generative model code generation solutions are typically used to generate synthetic source code in small snippets, e.g., one at a time. While this saves at least some labor relative to writing code entirely by hand, it can still be a cumbersome process to create code snippets one at a time. Moreover, there is no guarantee that any synthetic source code generated by the generative model will function correctly. Hallucinations, buggy code, and synthetic source code that compiles but does not perform the desired function can still occur. This leaves the tasks of testing (including unit testing) and debugging with the programmer.Summary

[0002] Implementations are provided herein for assisting in software development using artificial intelligence (Al) agents. Some implementations described herein addresses the problem of efficiently generating and testing code changes, particularly in large and complex codebases, to resolve bugs or implement new features. Various implementations described herein a framework that orchestrates multiple Al agents, each potentially employing procedural, dynamic, or hybrid approaches, or a combination thereof, to collaboratively address code modification tasks. One agent referred to herein as an “edit” agent may be configured to generate multiple candidate code changes and corresponding unit tests, allowing for selection of the best option based on test results. Other agents may be responsible for identifying relevant files of the codebase and planning the necessary code changes.

[0003] At least some of these agents operate within a secure sandboxed environment, which may take various forms, such as a virtual machine (VM) or a micro VM that may or may not run on a separate VM. These sandboxed environments provide strong isolation, preventing untrusted code from accessing sensitive resources. In various implementations, a sandboxed environmentAttorney Docket No. GOOG-0685-WO-Ol configured with selected aspects of the present disclosure may allow for snapshotting and restoring the virtual machine state, enabling efficient exploration of multiple code change strategies through forking and merging of agent trajectories. This facilitates iterative development and testing, where subsequent code changes build upon the results of previous iterations, all within the secure and recoverable sandboxed environment. The entire process, from bug detection to code generation and testing, may be managed in some implementations by an orchestrator agent, allowing for flexible combinations of procedural and dynamic agents to perform different tasks in the development process.

[0004] Implementations described herein allow for iterative refinement of code changes, where the state of the application is captured after each successful iteration, providing a starting point for the next iteration. A system configured with selected aspects of the present disclosure may operate in a cloud environment, processing code changes remotely and returning the changes to be merged into the main repository. A user may interact with the system at a high level, specifying tasks and receiving refined code changes without needing to understand the underlying agent interactions or sandboxed execution. The system may employ multiple agents, each with a specialized role (e.g., identifying relevant files, generating natural language descriptions of needed changes, generating candidate code changes and tests). The output of one agent can serve as input to another, creating a pipeline of actions that may iterate through multiple rounds of code generation and testing, refining code changes in each iteration. In some implementations, the workflow begins with processing “patch trigger data,” such as error messages, to identify relevant files, generate natural language descriptions of needed changes, generate candidate code changes and unit tests, and evaluate these changes to identify the best candidate code changes for inclusion in a multi-patch file. The resulting multi-file patch may then be applied (or “executed”) to the codebase.Brief Description of the Drawings

[0005] Fig. 1 shows a schematic diagram of a code knowledge system interacting with multiple clients and their codebases, utilizing machine learning models and programming language corpuses.

[0006] Fig. 2 illustrates a patch generation agent coordinating multiple agents to generate a multi-file patch, including file identification, code generation, unit test generation, and iterative testing and refinement.Attorney Docket No. GOOG-0685-WO-Ol

[0007] Fig. 3 presents a matrix illustrating the cross-evaluation of multiple candidate source code edits against multiple unit tests to select the best candidate.

[0008] Fig. 4 depicts the use of multiple micro VMs to iteratively generate and test code changes, with each micro VM capturing and restoring application states to enable efficient exploration of multiple code change strategies.

[0009] Fig. 5 shows a flowchart demonstrating selected aspects of the present disclosure.

[0010] Fig. 6 shows a flowchart demonstrating selected aspects of the present disclosure

[0011] Fig. 7 illustrates a block diagram of an example computing device capable of executing the described system and methods.

[0012] Fig. 8 illustrates one example of how agents may be implemented, in accordance with various implementations.

[0013] Fig. 9 illustrates one example of how agents may be implemented, in accordance with various implementations.

[0014] Fig. 10 illustrates one example of how agents may be implemented, in accordance with various implementations.

[0015] Fig. 11 illustrates one example of how agents may be implemented, in accordance with various implementations.Detailed Description

[0016] Implementations disclosed herein are directed to generating a multi-file patch to a codebase for an application. One or more agents identify one or more constituent source files in the codebase that are applicable to patch trigger data. Patch trigger data can include, for instance, error message data generated via execution of the application, a problem statement (e.g., "the application keeps crashing when processing particular data>") issued by an individual, etc. One or more agents generate a plurality of candidate source code changes and a plurality of unit tests using one or more generative models. One or more agents run different permutations of the unit tests and candidate source code changes. Based on outcomes, one candidate source code change is selected and incorporated into the multi-file patch.

[0017] Implementations disclosed herein can mitigate (e.g., eliminate) various drawbacks with current techniques. For example, the generation of multiple candidate code changes and corresponding unit tests, as described above, addresses the problem of generating single code snippets that may be buggy or non-functional. As another example, the iterative process of generating and testing code changes, starting from the state of the application after eachAttorney Docket No. GOOG-0685-WO-Ol successful iteration, overcomes the limitations of single-step code generation and testing. As another example, the use of a secure sandboxed environment for executing the code changes and unit tests prevents the corruption of the main codebase and allows for safe exploration of multiple code change strategies.

[0018] As a non-limiting example of some implementations disclosed herein, consider a scenario where a user encounters an error message indicating that a web application crashes when a specific type of data is processed. This error message serves as the patch trigger data. One agent, possibly a dynamic agent leveraging an LLM, analyzes the error message and identifies relevant files within the application's codebase (e.g., files related to data processing). Another agent, perhaps a hybrid agent combining procedural and dynamic approaches, then generates several candidate code changes targeting these files, each intending to address the crash. A third agent generates corresponding unit tests for each candidate code change. These candidates and tests are then executed in separate, snapshottable sandboxed environments. Each sandboxed environment starts from a snapshot of the application's state before the code changes are applied. The results of the unit tests in each sandboxed environment are compared, and the candidate code change that passes the most tests and / or successfully processes the problematic data type is selected as the best patch. This patch is then applied to the codebase.

[0019] Fig. 1 schematically depicts an example environment in which selected aspects of the present disclosure may be implemented, in accordance with various implementations. Any computing devices depicted in Fig. 1 or elsewhere in the figures may include logic such as one or more microprocessors (e.g., central processing units or “CPUs”, graphical processing units or “GPUs”, tensor processing units or "TPUs", neural processing units, or "NPUs") that execute computer-readable instructions stored in memory, or other types of logic such as applicationspecific integrated circuits (“ASIC”), field-programmable gate arrays (“FPGA”), and so forth. Some of the systems depicted in Fig. 1, such as a code knowledge system 102, may be implemented using one or more server computing devices that form what is sometimes referred to as a “cloud infrastructure,” although this is not required.

[0020] A code knowledge system 102 may be provided for helping clients 110-1 to 110-P manage their respective code bases 112-1 to 112-P. Code knowledge system 102 and clients 110-1 to 110-P may be communicatively coupled via one or more computer networks indicated generally at 199. Code knowledge system 102 may include, among other things, a plurality of agents 104 that are configured to perform selected aspects of the present disclosure in order toAttorney Docket No. GOOG-0685-WO-01 help one or more clients 110-1 to 110-P to manage and / or make changes (e.g., by way of multifile patches) to one or more corresponding code bases 112-1 to 112-P. Each client 110 may be, for example, an entity or organization such as a business (e.g., financial institute, bank, etc.), non-profit, club, university, government agency, or any other organization that operates one or more software systems. For example, a bank may operate one or more software systems to manage the money under its control, including tracking deposits and withdrawals, tracking loans, tracking investments, and so forth. An airline may operate one or more software systems for booking / canceling / rebooking flight reservations, managing delays or cancellations of flight, managing people associated with flights, such as passengers, air crews, and ground crews, managing airport gates, and so forth.

[0021] A plurality of agents 104 may be each be configured to perform one or more selected aspects of the present disclosure in order to aid clients 110-1 to 110-P in editing, updating, replatforming, migrating, or otherwise acting upon (e.g., applying patches to) their code bases 112-1 to 112-P. Agents 104 may be procedural, dynamic, or combinations thereof, and need not be of the same type. In some implementations, one or more of the agents 104 may be configured to perform some or all of the tasks described below with reference to a single code change candidate, and other ones of the agents 104 may be configured to perform similar tasks with respect to other code change candidates. In other implementations, some of the agents may perform all or part of the tasks described with reference to a single code change candidate, some agents may perform some of the tasks, and some agents may perform all of the tasks.

[0022] In various implementations, code knowledge system 102 may include a machine learning (“ML” in Fig. 1) database 105 that includes data indicative of one or more trained machine learning models 106-1 to 106-N. These trained machine learning models 106-1 to 106-N may take various forms, including generative models. Generative models may themselves take various forms, such as large language models (LLMs), and / or may take other forms, such as neural networks and / or other models that are trained to perform certain tasks, such as generating candidate code changes, testing candidate code changes, and / or analyzing source files of code changes to identify candidate code changes. Generative models may be encoder-decoder, encoder-decoder-encoder, autoencoders, and / or other forms. Some generative models may take the form of foundation models. Foundation models may act as encoders or decoders of various forms. In some implementations, a generative model may comprise a combination of other models.Attorney Docket No. GOOG-0685-WO-Ol

[0023] Generative models may have various numbers of parameters. For example, in some implementations, a generative model may include tens of thousands to hundreds of thousands of parameters, and may be used on a resource-constrained device such as a client device operated by a client 110. Other implementations may have other numbers of parameters, such as hundreds of thousands to millions of parameters, and yet other implementations may parameters that number in the billions, tens of billions, hundreds of billions, and even beyond.

[0024] In some implementations, code knowledge system 102 may also have access to one or more programming-language-specific corpuses 108-1 to 108-M. In some implementations, these programming-language-specific corpuses 108-1 to 108-M may be used, for instance, to train, fine-tune, or perform in-context learning using one or more of the machine learning models 106-1 to 106-N. In some implementations, the programming-language-specific corpuses 108-1 to 108-M may include examples of source code (e.g., entire code bases, libraries, etc.), inline comments, textual metadata associated with source code (e.g., commits), documentation such as textbooks and programming manuals, programming language-specific discussion threads, presentations, academic papers, and so forth. In various implements, at least some of the machine learning models 106-1 to 106-N may be trained using at least some of the programming-language-specific corpuses 108-1 to 108-M. In some implementations, some of the programming-language-specific corpuses 108-1 to 108-M may be used as training data for generative models 106-1 to 106-N, and other ones of the programming-language-specific corpuses 108-1 to 108-M may be used as testing data for generative models 106-1 to 106-N. In some implementations, some of the programming-language-specific corpuses 108-1 to 108-M may be used exclusively for training, and other ones of the programming-language-specific corpuses 108-1 to 108-M may be used exclusively for testing.

[0025] Fig. 2 schematically depicts an example of how various agents implemented by code knowledge system 102 may cooperate within a patch generation agent 200 to generate a multifile patch 230 for an application. In this example, patch generation agent 200 begins with patch trigger data 220. Patch trigger data 220 may be issued by a user (e.g., as a natural language problem statement) or may be generated via execution of an application, e.g., as output generated by the application or as an error message. Patch trigger data 220 may be processed by a procedural agent 204 A and a dynamic agent 204B. Procedural agent 204 A may be configured to programmatically (e.g., with preexisting instructions, logic, etc.) identify one or more constituent source code files 222 A in a codebase 212 of the application being patched that mayAttorney Docket No. GOOG-0685-WO-01 be applicable to patch trigger data 220. Dynamic agent 204B may also be configured to identify one or more constituent source code files 222B, except that instead of doing so programmatically, dynamic agent 204B employs one or more machine learning models (e.g., generative models 106-1 to 106-N in Fig. 1) to identify constituent source code files 222B. In many cases, constituent source code files 222A and constituent source code files 222B may at least partially overlap. In some such implementations, the overlapping constituent source code files may be the ones selected for downstream processing by other components depicted in Fig. 2.

[0026] Each of these constituent source code files 222A and 222B may be passed to an edit agent 204C. Edit agent 204C may be configured to generate a plurality of candidate source code edits 222-1-N and a plurality of unit tests (UT in Fig. 2) 223-1-M based on the constituent source code files 222 A and 222B. The plurality of candidate source code edits 222-1-N and a plurality of unit tests (UT in Fig. 2) 223-1-M may be provided to a testing agent 204D. Testing agent 204D may be configured to run different permutations of the unit tests 223-1-M and candidate source code edits 222-1-N. Based on outcomes of these different permutations of the unit tests 223-1-M and candidate source code edits 222-1-N, a particular source code edit 222E may be selected for inclusion in a multi-file patch 230. As indicated by the dashed lines, in some implementations, there may be multiple test agents 204D, 204D', each configured to run different permutations of different candidate source code edits and unit tests for a different source code file. In Fig. 2, for example, an additional testing agent 204D' generates an additional candidate source code edit 222F for inclusion in multi-file patch 230.

[0027] In the example of Fig. 2, testing agent 204D operates multiple subagents to performs iterative refinement of source code changes. A tester agent 224 may be configured to run the unit tests 223-1-M using the candidate source code edits 222-1-N, e.g., in a sandboxed environment configured with selected aspects of the present disclosure. Based on the outcomes of this testing, a repair agent 204E (which in some implementations may be the same as dynamic agent 204B) may be configured to generate refined candidate source code edits 222-1-X, which may be provided back to tester agent 224 for additional testing. This loop may repeat until one or more criteria e.g., predetermined number of loops, confidence measures of candidate source code edits satisfying a threshold, no more compiler errors or crashes, etc.) are met. At that point, testing agent may select the candidate source code change 222E that satisfies another criteria, such as the candidate source code change 222E that was able to pass the greatestAttorney Docket No. GOOG-0685-WO-Ol number of unit tests 223. As noted above, this selected candidate source code change 222E may be incorporated into multi-file patch 230, e.g., along with one or more additional candidate source code changes 222F generated by one or more additional testing agents.

[0028] In some implementations, an additional planner agent 204F may be configured to process patch trigger data 220 and / or constituent source code files 222A-B using one or more generative models to generate one or more natural language statements 226. Natural language statements 226 may include descriptions of functional changes to be made to codebase 212 and / or to constituent source code files 222A-B based on patch trigger data 220. In some implementations, candidate source code changes 222- 1-N may be generated based on one or more of natural language descriptions 226. Additionally or alternatively, in some implementations, the plurality of unit tests 223 -1-M may be generated based on one or more of natural language descriptions 226.

[0029] As an example, the patch trigger data 220 may include, for instance, an error code generated as a byproduct of an unexpected termination of the application. This error code may be processed by procedural agent 204A and dynamic agent 204B to identify constituent source code files 222A and 222B that are potentially applicable to the error code. The constituent source code files 222A and 222B may then be provided to edit agent 204C and planner agent 204F. Planner agent 204F may process the error code using one or more generative models to generate one or more natural language statements 226 describing one or more causes (e.g., bugs) of the error and / or one or more functional changes to be made to constituent source code files 222A and / or 222B to fix the underlying cause(s) of the error. Natural language statements 226 may then be processed by edit agent 204C to generate, e.g., for each of the constituent source code files 222A-B, source a plurality of candidate source code edits 222- 1-N and a plurality of unit tests 223-1-M therefrom. The candidate source code edits 222-1-N and unit tests 223-1-M may then be tested by testing agent 204D as described above. The resulting candidate source code edits 222E that satisfy one or more criteria may be incorporated into multi-file patch 230, e.g., along with any additional edits 222F generated by one or more additional testing agents 204'. Multi-file patch 230 may then be applied to codebase 212, and the application may be executed once again. The same error or a different error may trigger another iteration of the process demonstrated by Fig. 2.

[0030] Fig. 3 depicts an example of how different permutations of unit tests 223-1-M and candidate source code edits 223 -1-N may be evaluated to select one of the candidate source codeAttorney Docket No. GOOG-0685-WO-Ol edits 222E for inclusion in multi-file patch 230. In various implementations, the testing agent 204D may be configured to cross-evaluate different permutations of these unit tests 223 -1-M and candidate source code edits 233- 1-N, e.g., in a sandboxed environment. As demonstrated by the pictured matrix, a plurality of candidate source code edits A-J are cross evaluated against a plurality of unit tests A-J. For example, candidate source code edit A is tested using each unit test A-J, candidate source code edit B is tested using each unit test A-J, and so on. As shown in the legend, if a candidate source code edit passes a particular unit test, the corresponding cell of the matrix is shaded. If a candidate source code edit fails a particular unit test, the corresponding cell of the matrix is left unshaded.

[0031] In this example, candidate source code edit A passes unit tests A, C, and G, and fails the rest. Candidate source code edit B passes unit tests B, C-D, and G-I, for a total of six passes, and fails the rest. Candidate source code edit C passes unit tests C and I- J, and fails the rest.Candidate source code edit D passes unit tests B, C, and G, and fails the rest, and so on. It can be seen that of all the candidate source code edits A-J, source code edit H passes the greatest number of unit tests (A-F, H), for a total of seven passes. Accordingly, in implementations in which the candidate source code edit selection criteria is whichever candidate source code edit passes the greatest number of unit tests, source code edit H would be the selected candidate source code edit for inclusion in multi-file patch 230.

[0032] Fig. 4 schematically depicts an example of how sandboxed environments in the form of micro VMs 452-1 to 452-7 may be used to create, preserve, and / or restore snapshots of an application state in order to aid clients 110-1 to 110-P to iteratively generate multi -file patches to the codebase 212 in accordance with various aspects of the present disclosure. In particular, Fig. 4 demonstrates how micro VMs 452-1 to 452-7 may be used to perform iterative refinement of multi -file patches 230 with respect to various constituent source code files in codebase 212. It should be noted that although six micro VMs 452-1 to 452-7 are depicted in Fig. 4, other implementations need not be so limited. As shown by the arrow in Fig. 4, time runs from left to right.

[0033] Starting at top left, a first micro VM 452-1 is executing the application until a first application state 450-1 is reached, and an unexpected termination or "crash" is experienced, as represented by the starburst. At this point, first micro VM 452-1 snapshots first application state 450-1 and provides a snapshot of first application state 450-1 to an orchestrator agent 404. Orchestration agent 404 may initiate the second micro VM 452-2 to implement an instance ofAttorney Docket No. GOOG-0685-WO-Ol patch generation agent 200. Patch generation agent 200 may operate within second micro VM 452-2 using the same techniques described above (e.g., with reference to Fig. 2) to generate a first multi-file patch 430-1 for the application to be incorporated into codebase 212.

[0034] At this point, a third micro VM 452-3 may execute the application starting at first application state 450-1, which may be restored, until execution of the application reaches a second application state 450-2, and another unexpected termination occurs, as represented by the starburst. As before, a snapshot of second application state 450-2 may be created and provided to orchestrator agent 404, and orchestrator agent 404 may initiate a fourth micro VM 452-4 to implement an instance of patch generation agent 200. This time, patch generation agent 200 may operate within fourth micro VM 452-4 using the same techniques described above (e.g., with reference to Fig. 2) to generate a second multi-file patch 430-2 to be incorporated into codebase 212.

[0035] Similar to before, a fifth micro VM 452-5 may execute the application starting at second application state 450-2, which may be restored, until execution of the application reaches a third application state 450-3, and yet another unexpected termination occurs, as represented by the starburst. As before, a snapshot of third application state 450-3 may be created and provided to orchestrator agent 404, and orchestrator agent 404 may initiate a sixth micro VM 452-6 to implement an instance of patch generation agent 200. This time, patch generation agent 200 may operate within sixth micro VM 452-6 using the same techniques described above (e.g., with reference to Fig. 2) to generate a third multi-file patch 430-3 to be incorporated into codebase 212. At this point, a seventh micro VM 452-7 may execute the application starting at third application state 450-3, which may be restored. Eventually an expected termination may occur, at which point the iterative refinement of the multi-file patch may complete.

[0036] Fig. 5 depicts an example method 500 for performing selected aspects of the present disclosure. For convenience, the operations of the flow chart are described with reference to a system that performs the operations. This system may include various components of various computer systems, such as code knowledge system 102 and / or client device 110. Moreover, while operations of method 500 are shown in a particular order, this is not meant to be limiting. One or more operations may be reordered, omitted or added.

[0037] At block 502, the system may cause one or more agents (104 in Fig. 1) to identify one or more constituent source files (222A, 222B in Fig. 2) in the codebase (212 in Fig. 2) that are applicable to patch trigger data (220 in Fig. 2). This may involve using procedural agents (204AAttorney Docket No. GOOG-0685-WO-Ol in Fig. 2) or dynamic agents (204B in Fig. 2), or both, potentially leveraging machine learning models (106-1 to 106-N in Fig. 1) and / or programming-language-specific corpuses (108-1 to 108-M in Fig. 1). The patch trigger data may comprise data generated via execution of the application, such as an error message indicating an unexpected termination, or a natural language problem statement.

[0038] At block 504, the system may cause one or more of the agents (104 in Fig. 1) to generate a plurality of candidate source code changes (222- 1-N in Fig. 2) using one or more generative models (106-1 to 106-N in Fig. 1). These models may be LLMs or other types of generative models. In some implementations, the one or more generative models may include a first generative model for generating candidate source code changes (e.g., in the form of "candidate patches") and a second generative model for generating unit tests. In other implementations, the same generative model may be used to generate both. The generation of candidate source code changes may be based on the identified constituent source files and, in some implementations, may be informed by natural language descriptions of needed changes generated by other agents (e.g., planner agent 204 F in Fig. 2).

[0039] At block 506, the system may cause one or more of the agents (104 in Fig. 1) to generate a plurality of unit tests (223-1-M in Fig. 2) using one or more of the generative models (106-1 to 106-N in Fig. 1). In some implementations, each unit test may be generated based on a respective candidate source code change, and in some cases, at least one unit test generated based on one candidate source code change may be used to test another candidate source code change. In some implementations, the generation of unit tests may be informed by natural language descriptions (226 in Fig. 2) of needed changes generated by other agents (e.g., planner agent 204 F in Fig. 2).

[0040] At block 508, the system may cause one or more of the agents (104 in Fig. 1) to run different permutations of the unit tests (223-1-M in Fig. 2) and candidate source code changes (222-1-N in Fig. 2), e.g., in one or more sandboxed environments (452-1 to 452-7 in Fig. 4), such as VMs or micro VMs. Some such VMs or micro VMs may be snapshottable and restorable, as demonstrated in Fig. 4. This iterative process may involve multiple rounds of code generation and testing, with each iteration starting from a state of the application (450-1, 450-2, 450-3 in Fig. 4) after a successful completion of a previous iteration. The sandboxed environments allow for the safe and efficient exploration of multiple code change strategies.Attorney Docket No. GOOG-0685-WO-Ol

[0041] At block 510, the system may select, based on the outcomes of the different permutations run in the sandboxed environments (4521-452 7 in Fig. 4), one of the candidate source code changes (222E in Fig. 2) and incorporate it into the multi-file patch (230 in Fig. 2). This selection may be based on which candidate source code change satisfies the most unit tests (Fig. 3), or other criteria. The selected candidate source code changes may be applied to the codebase 212. In some implementations, the system may include an orchestrator (404 in Fig. 4) to manage the agents (104 in Fig. 1), which may be procedural (204A in Fig. 2), dynamic , or hybrid. At block 512, the system may apply the multi-file patch (230 in Fig. to the codebase (212 in Fig. 2).

[0042] Fig. 6 depicts an example method 600 for performing selected aspects of the present disclosure. For convenience, the operations of the flow chart are described with reference to a system that performs the operations. This system may include various components of various computer systems, such as code knowledge system 102 and / or client device 110. Moreover, while operations of method 600 are shown in a particular order, this is not meant to be limiting. One or more operations may be reordered, omitted or added.

[0043] At block 602, the system may execute an application in one or more virtual machines (e.g., micro VMs 452-1 to 452-7 in Fig. 4) until one or more predetermined events occur. These events might include error messages , unexpected terminations, functional failures of the application, particulate targeted states of the application, etc.

[0044] At block 604, the system may determine whether one or more of the predetermined events has been detected. If the answer is no, method 600 may proceed back to block 602. However, upon detecting a first event of the predetermined events, at block 606, the system may capture a snapshot of a first application state (450-1 in Fig. 4) of the application corresponding to detection of the first event. This snapshot captures the state of the application at the point of the first event, preserving it for later use as a "current application state."

[0045] At block 608, and based on the first event, the system may cause a second virtual machine (452-2 in Fig. 4) to host an instance of a patch generation agent (200 in Fig. 2) that iteratively generates and tests candidate source code changes (222- 1-N in Fig. 2), e.g., using unit tests (223 -1-M in Fig. 2), to generate a multi -file patch (230 in Fig. 2) for a codebase (212 in Fig. 2) of the application. In some implementations, the patch generation agent may utilize generative models (106-1 to 106-N in Fig. 1) and other agents (204A, 204B, 204C, 204D, 204E in Fig. 2) as described previously.Attorney Docket No. GOOG-0685-WO-Ol

[0046] At block 610, the system may apply the multi -file patch to the codebase to generate an updated application. This updated application incorporates the changes suggested by the patch generation agent. Method 600 may then proceed back to block 602, at which point the system may execute the updated application beginning at the "current" application state (which after one iteration would be 450-1 in Fig. 4) in the first virtual machine (452-1 in Fig. 4) or a third virtual machine (452-3 in Fig. 4) until one or more of the predetermined events occur. Method 600 may proceed as described above, and may be iterated until one or stop conditions is reached, such as the application completing execution or arriving at some desired state without error, after some number of iterations, etc.

[0047] Fig. 7 depicts a block diagram of an example computing device 700. Computer system 710 typically includes processor(s) 714 which communicates with a number of peripheral devices via bus subsystem 712. These peripheral devices may include a storage subsystem 724, including, for example, a memory subsystem 725 and a file storage subsystem 726, user interface output devices 720, user interface input devices 722, and a network interface subsystem 716. The input and output devices allow user interaction with computer system 710. Network interface subsystem 716 provides an interface to outside networks and is coupled to corresponding interface devices in other computer systems.

[0048] User interface input devices 722 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touchscreen incorporated into the display, audio input devices such as voice recognition systems, microphones, and / or other types of input devices. In general, user interface input devices 722 may include any device for inputting information into computer system 710.

[0049] User interface output devices 720 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem may include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem may also provide non-visual display such as via audio output devices. In general, user interface output devices 720 may include any device for outputting information from computer system 710 to the user or to another machine or computer system.

[0050] Storage subsystem 724 stores programming and data constructs that provide the functionality of some or all of the modules described herein. For example, the storage subsystem 724 may include the logic to perform selected aspects of the methods of Fig. 5 and 6. TheseAttorney Docket No. GOOG-0685-WO-Ol software modules are generally executed by processor(s) 714 alone or in combination with other processors. Processor(s) 714 may take various forms, such as a central processing unit (CPU), a graphics processing unit (GPU), a tensor processing unit (TPU), a neural processing unit (NPU), and so forth.

[0051] Memory 725 used in the storage subsystem 724 can include a number of memories including a main random access memory (RAM) 730 for storage of instructions and data during program execution and a read only memory (ROM) 732 in which fixed instructions are stored. A file storage subsystem 726 can provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a CD- ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations may be stored by file storage subsystem 726 in the storage subsystem 724, or in other machines accessible by the processor(s) 714.

[0052] Bus subsystem 712 provides a mechanism for letting the various components and subsystems of computer system 710 communicate with each other as intended. Although bus subsystem 712 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple busses.

[0053] Computer system 710 can be of varying types including a workstation, server, computing cluster, blade server, server farm, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computer system 710 depicted in Fig. 7 is intended only as a specific example for purposes of illustrating some implementations. Many other configurations of computer system 710 are possible having more or fewer components than the computer system depicted in Fig. 7.

[0054] Figs. 8-11 schematically depict non-limiting examples of how agents may be implemented, in accordance with various implementations. Fig. 8 depicts how a planner agent 804 may process data, in accordance with various implementations. A worker list at left may provide a list of different types of agents (or “workers” or “tools”) that are available to the orchestrator agent 804. In this example, these types of agents include data loading, data cleaning, data wrangling, data analysis, data visualization, feature engineering, model training, model optimization, model evaluation, data expl oration, data splitting, and data_preparation.

[0055] In various implementations, the task data (e.g., patch trigger data 220) and worker list may be provided to planner agent 804. Planner agent 804, which may be implemented withinAttorney Docket No. GOOG-0685-WO-01 one of the sandboxed environments mentioned elsewhere herein, may be configured to process the task data and worker list, e.g., using programmatic logic, machine learning model(s) such as trained classifiers, generative model(s), to generate output. The output may include, in this example, a plan. The plan may include and / or identify, for example, workers from the worker list that should be initiated at each of a plurality of steps.

[0056] Fig. 9 schematically depicts how an orchestrator agent 904 may operate in some implementations. Task data, the plan generated by planner agent 804, and a history of subtasks already generated / performed may be provided to orchestrator agent 904. Orchestrator agent 904 may then process this data, e.g., using or one or more generative models, to generate one or more subtasks. As demonstrated by the arrow going from the subtask back to the task data, and as shown at bottom, these subtasks may be generated incrementally in some implementations, and the just generated subtask may be added to the history to generate the next subtask.

[0057] Fig. 10 schematically depicts, in a different manner than Fig. 9, how an orchestrator agent 904 may operate in some implementations. Starting at left, a state may be processed by the orchestrator agent 904 to identify which worker (“data loading” in the first instance) should be trigger next. The output of that triggered data loading worker (“Summary” in Fig. 10) may be added to the state. During the next iteration, the updated state may result in the data cleaning worker being triggered, e.g., to handle any missing data and / or inconsistencies in the data previously added to the state. The result is an updated state that includes an updated summary.

[0058] Fig. 11 schematically depicts one example of how a worker agent may be implemented. A state may include the high level task being performed, the particular subtask assigned to this worker agent, and a plan for performing the subtask. This particular worker agent has two actions available to it: code block (to generate a block of code) and finish task (to complete the code generation). Herein the result of the first state is the worker agent selecting the code block action, at which point data indicative of reasoning, dataframes, and code execution is added to the state. This is repeated during the next iteration and additional data indicative of reasoning, dataframes, and code execution is added to the state once again. Finally, when the selected action is finish task, data indicative of reasoning is added to the state.

[0059] In various implementations, a method is provided for generating a multi-file patch to a codebase for an application. One or more agents may identify one or more constituent source files in the codebase that may be applicable to patch trigger data. Patch trigger data may include, for example, error message data generated via execution of the application, or aAttorney Docket No. GOOG-0685-WO-01 problem statement issued by an individual. One or more agents may generate a plurality of candidate source code changes and a plurality of unit tests using one or more generative models. One or more agents may run different permutations of the unit tests and candidate source code changes. Based on outcomes, one candidate source code change may be selected and incorporated into the multi-file patch.

[0060] In various implementations, the multi-file patch may be applied to the codebase. One or more of the generative models may include one or more large language models. A first generative model may be used to generate the plurality of candidate multi-file patches, and a second generative model may be used to generate the plurality of unit tests. Iterative generation and testing of candidate source code changes may be performed, wherein each iteration may start from the state of the application after a successful completion of a previous iteration.

[0061] The different permutations of the unit tests and candidate source code changes may be executed in one or more sandboxed environments. The sandboxed environment may include a virtual machine. The virtual machine may be snapshottable and restorable. The patch trigger data may include data generated via execution of the application. The patch trigger data may include an error message generated based on an unexpected termination of the application. The patch trigger data may include a natural language problem statement.

[0062] One or more agents may generate natural language descriptions of changes to be made to the codebase based on the patch trigger data. The plurality of candidate source code changes may be generated based on one or more of the natural language descriptions. The plurality of unit tests may be generated based on one or more of the natural language descriptions. Each of the plurality of unit tests may be generated based on a respective candidate source code change. At least one of the unit tests generated based on one of the candidate source code changes may be used to test another of the candidate source code changes. The one or more constituent files may be identified by a procedural agent. The one or more constituent files may be identified by a dynamic agent.

[0063] In other implementations, a system and / or transitory or non-transitory computer-readable storage medium may be provided for performing any of the methods described above. The system may include one or more processors and memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform any of the methods described above. In various implementations, a transitory or non-transitory computer- readable storage medium may be provided. The computer-readable storage medium may includeAttorney Docket No. GOOG-0685-WO-01 instructions that, when executed by one or more processors, cause the one or more processors to perform any of the methods described above.

[0064] In another aspect, a system is provided for generating and testing code changes in a codebase. The system may include a plurality of agents, wherein at least one agent may identify relevant files in the codebase, at least one agent may generate a plurality of candidate patches based on the relevant files, and at least one agent may generate a plurality of unit tests for the candidate patches. An evaluator may evaluate different permutations of the unit tests and candidate patches to select a given patch of the plurality of candidate patches for inclusion in a multi -file patch. At least one of the agents may include a large language model. The agents may be selected from the group consisting of procedural agents, dynamic agents, and hybrid agents. A sandboxed environment may be provided for executing the candidate patches and unit tests. The sandboxed environment may include a micro virtual machine (micro VM) that may be snapshottable and restorable.

[0065] While several implementations have been described and illustrated herein, a variety of other means and / or structures for performing the function and / or obtaining the results and / or one or more of the advantages described herein may be utilized, and each of these variations and modifications is deemed to be within the scope of the implementations described herein. More generally, all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and / or configurations will depend upon the specific application or applications for which the teachings is / are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific implementations described herein. It is, therefore, to be understood that the foregoing implementations are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, implementations may be practiced otherwise than as specifically described and claimed. Implementations of the present disclosure are directed to each individual feature, system, article, material, kit, and / or method described herein.

Claims

Attorney Docket No. GOOG-0685-WO-01CLAIMSWhat is claimed is:

1. A method for generating a multi -file patch to a codebase for an application, the method comprising: causing one or more agents to identify one or more constituent source files in the codebase that are applicable to patch trigger data; causing one or more of the agents to generate a plurality of candidate source code changes using one or more generative models, wherein the candidate source code changes are based on the one or more constituent source code files; causing one or more of the agents to generate a plurality of unit tests using one or more of the generative models; causing one or more of the agents to run different permutations of the unit tests and candidate source code changes; and based on outcomes of the different permutations of the unit tests and candidate source code changes, selecting, and incorporating into the multi-file patch, one of the candidate source code changes.

2. The method of claim 1, further comprising applying the multi-file patch to the code base.

3. The method of claim 1 or 2, wherein one or more of the generative models comprises one or more large language models.

4. The method of any of the preceding claims, wherein a first generative model is used to generate the plurality of candidate multi-file patches, and a second generative model is used to generate the plurality of unit tests.

5. The method of any of the preceding claims, further comprising: iteratively generating and testing candidate source code changes, wherein each iteration starts from a state of the application after a successful completion of a previous iteration.

6. The method of any of the preceding claims, wherein the different permutations of the unit tests and candidate source code changes are executed in one or more sandboxed environments.

7. The method of claim 6, wherein the sandboxed environment comprises a virtual machine.

8. The method of claim 7, wherein the virtual machine is snapshottable and restorable.Attorney Docket No. GOOG-0685-WO-019. The method of any of the preceding claims, wherein the patch trigger data comprises data generated via execution of the application.

10. The method of claim 9, wherein the patch trigger data comprises an error message generated based on an unexpected termination of the application.

11. The method of any of the preceding claims, wherein the patch trigger data comprises a natural language problem statement.

12. The method of any of the preceding claims, further comprising causing one or more of the agents to generate natural language descriptions of changes to be made to the codebase based on the patch trigger data.

13. The method of claim 12, wherein the plurality of candidate source code changes are generated based on one or more of the natural language descriptions.

14. The method of claim 12, wherein the plurality of unit tests are generated based on one or more of the natural language descriptions.

15. The method of any of the preceding claims, wherein each of the plurality of unit tests is generated based on a respective candidate source code change.

16. The method of 15, wherein at least one of the unit tests generated based on one of the candidate source code changes is used to test another of the candidate source code changes.

17. The method of any of the preceding claims, wherein the one or more constituent files are identified by a procedural agent.

18. The method of any of the preceding claims, wherein the one or more constituent files are identified by a dynamic agent.

19. A system comprising one or more processors and memory storing instructions that, in response to execution of the instructions by the one or more processors, cause the one or more processors to perform the method of any one of claims 1 to 18.

20. At least one transitory or non-transitory computer-readable medium comprising instructions that, in response to execution of the instructions by one or more processors, cause the one or more processors to perform the method of any one of claims 1 to 18.

21. A system for generating and testing code changes in a codebase, comprising: a plurality of agents, wherein at least one agent identifies relevant files in the codebase, at least one agent generates a plurality of candidate patches based on the relevant files, and at least one agent generates a plurality of unit tests for the candidate patches; andAttorney Docket No. GOOG-0685-WO-01 an evaluator that evaluates different permutations of the unit tests and candidate patches to select a given patch of the plurality of candidate patches for inclusion in a multi-file patch.

22. The system of claim 21, wherein at least one of the agents comprises a large language model.

23. The system of claim 21 or 22, wherein the agents are selected from the group consisting of procedural agents, dynamic agents, and hybrid agents.

24. The system of any of claims 21-23, further comprising: a sandboxed environment for executing the candidate patches and unit tests.

25. The system of claim 24, wherein the sandboxed environment comprises a micro virtual machine (micro VM) that is snapshottable and restorable.

26. A method for iteratively generating and testing code changes for an application, the method comprising:(a) establishing a plurality of sandboxed virtual machine environments, each environment comprising a snapshottable and restorable state of the application;(b) generating, in at least one of the sandboxed virtual machine environments, a plurality of candidate code changes based on a trigger event;(c) generating, in at least one of the sandboxed virtual machine environments, a plurality of unit tests corresponding to the plurality of candidate code changes;(d) executing, in at least one of the sandboxed virtual machine environments, at least one permutation of the plurality of candidate code changes and the plurality of unit tests;(e) restoring, from a snapshot, the state of at least one of the sandboxed virtual machine environments; and(f) repeating steps (b)-(e) based on the results of step (d).

27. A method for iteratively generating and testing code changes, comprising: establishing a plurality of sandboxed virtual machine environments, each environment comprising a snapshottable and restorable state of an application; generating, in at least one of the sandboxed virtual machine environments, a plurality of candidate code changes; generating, in at least one of the sandboxed virtual machine environments, a plurality of unit tests corresponding to the plurality of candidate code changes; executing, in at least one of the sandboxed virtual machine environments, at least one permutation of the plurality of candidate code changes and the plurality of unit tests;Attorney Docket No. GOOG-0685-WO-Ol restoring, from a snapshot, the state of at least one of the sandboxed virtual machine environments; and repeating the generating, generating, and executing steps based on the results of the executing step.

28. A method implemented using one or more processors, comprising: executing an application in one or more virtual machines until one or more predetermined events occur; upon detecting a first event of the predetermined events, capturing a snapshot of a first application state of the application corresponding to detection of the first event; based on the first event, causing a second virtual machine to host an instance of a patch generation agent that iteratively generates and tests candidate source code changes to generate a multi-file patch for a codebase of the application; applying the multi-file patch to the codebase to generate an updated application; and executing the updated application beginning at the first application state in the first virtual machine or a third virtual machine until one or more of the predetermined events occur.

29. The method of claim 28, further comprising: upon detecting another given event of the predetermined events, capturing a snapshot of a second application state of the application corresponding to detection of the another given event; based on the another given event, causing the same instance of the patch generation agent hosted by the second virtual machine, or a different instance of the patch generation agent hosted by a fourth virtual machine, to iteratively generate and test candidate source code changes to generate another multi-file patch for the codebase of the application; and applying the another multi-file patch to the codebase to generate a further updated application.

30. The method of claim 28 or 29, wherein one or more instances of the virtual machines comprise a micro VM.

31. The method of any of claims 28-30, wherein one or more of the predetermined events comprises an error message.

32. The method of any of claims 28-31, wherein one or more of the predetermined events comprises an unexpected termination of the application.Attorney Docket No. GOOG-0685-WO-0133. The method of any of claims 28-32, wherein one or more of the predetermined events comprises a functional failure of the application.

34. A system comprising one or more processors and memory storing instructions that, in response to execution of the instructions by the one or more processors, cause the one or more processors to perform the method of any one of claims 28 to 33.

35. At least one transitory or non-transitory computer-readable medium comprising instructions that, in response to execution of the instructions by one or more processors, cause the one or more processors to perform the method of any one of claims 28 to 33.