Executing a computational graph on a graphics processing unit

By generating data entities containing machine code and data structures at compile time, the problem of wasted GPU resources in existing compilation systems is solved, and more efficient GPU utilization is achieved.

CN114429201BActive Publication Date: 2026-06-12GOOGLE LLC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GOOGLE LLC
Filing Date
2018-01-08
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

When existing compilation systems require the host CPU to perform calculations, the GPU cannot be used efficiently, resulting in a waste of hardware resources.

Method used

Data entities are generated during compilation. Compilation artifacts containing machine code, data structures, buffer data, and library data are generated on the GPU. After compilation, the data entities are called at runtime to enable the GPU to process the computation graph.

🎯Benefits of technology

It improves the processing efficiency of the GPU, reduces the frequent interaction between the GPU and the host CPU, and improves the overall system efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114429201B_ABST
    Figure CN114429201B_ABST
Patent Text Reader

Abstract

This application relates to executing a computation graph on a graphics processing unit. Methods, systems, and apparatuses including a computer program encoded on a computer storage medium for generating a data entity that causes a processing unit to process a computation graph. In one aspect, the method includes the acts of receiving data identifying a computation graph, the computation graph including a plurality of nodes representing operations; obtaining a compilation artifact for processing the computation graph on the processing unit; and generating the data entity from the compilation artifact, wherein the data entity, when invoked, causes the processing unit to process the computation graph by performing the operations represented by the plurality of nodes.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Case Analysis

[0002] This application is a divisional application of Chinese invention patent application 201810015495.3, filed on January 8, 2018. Technical Field

[0003] This specification relates to processing computational graphs representing neural networks. Background Technology

[0004] A neural network is a machine learning model that uses one or more layers to generate outputs, such as one or more classifications, from received inputs. Some neural networks include one or more hidden layers in addition to the output layer. The output of each hidden layer is used as the input to the next layer in the network—that is, the next hidden layer or output layer. Each layer of the network generates an output from the received inputs based on the current values ​​of its corresponding set of parameters. Summary of the Invention

[0005] This specification generally describes a system implemented as a computer program for generating data entities on one or more computers in one or more locations. When a data entity is invoked, it causes a graphics processing unit (GPU) to process the computation graph by performing operations associated with the computation graph.

[0006] Typically, an innovative aspect of the subject matter described in this specification can be embodied in an operating method comprising: receiving data identifying a computation graph, the computation graph including a plurality of nodes representing operations; obtaining a compilation artifact for processing the computation graph on a processing unit; and generating a data entity from the compilation artifact, wherein the data entity, when invoked, causes the processing unit to process the computation graph by performing operations represented by the plurality of nodes.

[0007] The foregoing and other embodiments may optionally include one or more of the following features, individually or in combination. In particular, one embodiment includes a combination of all the following features. The compilation artifact further includes: buffer data representing a plurality of buffer parameters and buffer data relating the plurality of buffer parameters to the operation, each of the plurality of buffer parameters being associated with a corresponding operation in the operation. The compilation artifact further includes: a corresponding logical number assigned to the plurality of buffer parameters, and data specifying the association between the logical number and the operation represented by the plurality of nodes. The compilation artifact further includes: a data structure representing (i) a sequence of operations and (ii) the dependencies between the operations. The data structure is a directed acyclic graph (DAG). The method further includes maintaining a plurality of libraries, each of the plurality of libraries storing one or more subroutines. The compilation artifact further includes: library data representing a plurality of buffer parameters and the association between the plurality of buffer parameters and the plurality of libraries, each of the plurality of buffer parameters being associated with a corresponding library in the plurality of libraries. The compilation artifact further includes: machine code configured to process the computation graph. The operation is an operation for processing the input of the neural network through one or more layers of the neural network to generate the output of the neural network. The operation involves training the neural network by adjusting its parameter values. The processing unit is a GPU or a central processing unit (CPU).

[0008] Typically, another innovative aspect of the subject matter described in this specification can be embodied in an operating method comprising: receiving input buffer parameters as user-specific input to a computation graph, the computation graph including a plurality of nodes representing operations; receiving a data entity including buffer data representing (i) a plurality of buffer parameters including the input buffer parameters and (ii) the association between the plurality of buffer parameters and the operations; and invoking the data entity using the input buffer parameters such that a processing unit processes the computation graph according to the input buffer parameters by performing the operations.

[0009] The foregoing and other embodiments may optionally include one or more of the following features, individually or in combination. In particular, one embodiment includes a combination of all the following features. Invoking the data entity using the input buffer parameters includes: identifying one or more operations from the computation graph based on the data entity, the one or more operations corresponding to the input buffer parameters, and queuing the one or more operations on the processing unit to process the computation graph. The operations include multiple operation groups, and the data entity includes data entities representing (i) multiple streams, each stream representing a sequence of operations and (ii) dependencies between the operations, and wherein invoking the data entity using the input buffer parameters includes: identifying multiple streams from the computation graph based on the data entity, each stream associated with a corresponding operation group from the operations; queuing the corresponding operation group into the stream associated with the corresponding operation group for each of the multiple streams; and defining the dependencies between the multiple operation groups based on the data entity. The processing unit is a GPU or a CPU.

[0010] The subject matter described in this specification can be implemented in specific embodiments to achieve one or more of the following advantages. Conventional compilation systems require a host CPU to process computations. In such conventional compilation systems, the GPU is idle while the host CPU is processing some computations. Therefore, conventional compilation systems cannot efficiently utilize the GPU's hardware resources. Unlike conventional compilation systems, the system described in this specification generates data entities that include all the necessary descriptions for performing computations. Specifically, the data entities are generated at compile time and can then be invoked at runtime to cause the GPU to process the computation graph. This allows all computations represented by the computation graph to be queued as a whole on the GPU, thereby improving the efficiency of machines including GPUs.

[0011] Details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the following description. Other features, aspects, and advantages of this subject matter will become apparent from the description, drawings, and claims. Attached Figure Description

[0012] Figure 1A The illustration shows an example computation graph system used to generate data entities from a computation graph.

[0013] Figure 1B The illustration shows an example machine used to process computation graphs.

[0014] Figure 2 This is a flowchart of an example process for generating data entities from a computation graph.

[0015] Figure 3This is a flowchart illustrating an example process of calling data entities to process a computation graph by performing operations represented by the computation graph.

[0016] Figure 4 This is a flowchart illustrating an example process of calling data entities to process a computation graph by performing operations represented by the computation graph.

[0017] Similar reference numerals and names in the various figures indicate similar elements. Detailed Implementation

[0018] This specification generally describes a computational graph system for generating data entities that, when invoked, cause a graphics processing unit (GPU) to process the computational graph by performing operations represented by the computational graph. Specifically, when a data entity is invoked with input buffer parameters, the data entity is queued as a whole into the GPU, and the GPU processes the computational graph according to the input buffer parameters.

[0019] Figure 1A The illustration shows an example computation graph system 100 for generating data entities from a computation graph. The computation graph system 100 is an example of a system implemented as a computer program on one or more computers in one or more locations, wherein the systems, components, and techniques described below can be implemented.

[0020] A user of client 102 can request to perform operations on a computation graph representing a neural network. Client 102 can be an application running on a computer. As part of the request, client 102 provides data identifying the computation graph to computation graph system 100 and specifies the type of operation to be performed on that computation graph. For example, the request could identify the computation graph representing an inference of a particular neural network and could identify the input on which the inference should be performed. As another example, the request could identify the computation graph representing a neural network to be trained and the input, such as training data, on which training should be performed.

[0021] The computation graph system 100 receives data comprising a computation graph as input. This computation graph uses nodes representing operations to represent the computation of a machine learning model. Specifically, the computation graph uses directed edges representing nodes and data dependencies between operations to express, for example, the computation of a machine learning model. Incoming edges to a node represent the input stream to that node, i.e., the input arguments of the operation represented by that node. If all the arguments required for the operation are available to the operation node, then the node is enabled and can be executed.

[0022] Outgoing edges from a node represent the output stream of an operation represented by that node, which is to be used as input to an operation represented by another node. Therefore, a directed edge connecting a first node in the graph to a second node in the graph indicates that the output generated by the operation represented by the first node is used as input to the operation represented by the second node.

[0023] In some implementations, the operations represented in the computation graph are linear algebra operations—such as matrix multiplication, neural network operations, or operations of different kinds of machine learning models. A neural network is a machine learning model that uses one or more non-linear units to predict the output of a received input. Some neural networks are deep neural networks that include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to another layer in the network—that is, another hidden layer, an output layer, or both. Some layers of the network generate outputs from the received input based on the current values ​​of the corresponding set of parameters, while other layers of the network may not have parameters.

[0024] The operations represented by the computational graph can be those necessary for neural network computational inference—that is, processing the inputs through the layers of the neural network to generate the neural network outputs from those inputs. Alternatively, the operations represented by the computational graph can be those necessary for training the neural network by performing a neural network training process—for example, using backpropagation to determine training values ​​for the parameters from initial values. In some cases, such as during neural network training, the operations represented by the computational graph can include operations performed by multiple copies of the neural network.

[0025] As an illustration, a neural network layer that receives input from a previous layer can use a parameter matrix to perform matrix multiplication between that parameter matrix and the input. In some cases, matrix multiplication can be represented as multiple nodes in a computation graph. For example, matrix multiplication can be divided into multiple multiplication and addition operations, and each operation can be represented by a different node in the computation graph. The operation represented by each node can generate a corresponding output, which flows along directed edges to subsequent nodes. After the operation represented by the final node generates the result of the matrix multiplication, the result flows along directed edges to another node. This result is equivalent to the output of the neural network layer that performed the matrix multiplication.

[0026] In some other cases, matrix multiplication is represented as a node in the graph. The operation represented by this node can take an input tensor on a first directed edge and a weight tensor—e.g., a parameter matrix—on a second directed edge as input. This node can process—for example, perform matrix multiplication of the input and weight tensors—to output an output tensor on a third directed edge that is equivalent to the output of a neural network layer.

[0027] Other neural network operations that can be represented by nodes in the computation graph include other mathematical operations such as subtraction, division, and gradient calculation; array operations such as concatenation, splicing, splitting, or ranking; and neural network building block operations such as SoftMax, Sigmoid, Rectified Linear Unit (ReLU), or convolution.

[0028] In an example system, one or more sets of nodes in a computation graph can represent operations that control the flow of data through the computation graph. For example, the one or more sets of nodes can represent conditional, recursive, and / or iterative control flow statements, including: if statements, while loops, do-while loops, for loops, for-each loops, or nested control flow statements that include combinations of these statements.

[0029] A set of one or more nodes in a computation graph can represent operations that can be translated into operations in a high-performance library containing high-performance implementations of linear algebra (e.g., matrix multiplication) or neural network operations (e.g., backward convolution).

[0030] In one example build system, multiple operations are merged into a single fused operation, which can be converted into a call to execute all fused operations at code generation time. This fusion process generates efficient code for devices such as CPUs or GPUs.

[0031] The computation graph system 100 includes a compiler 103 and a data entity generator 105. The compiler 103 can transform the computation graph of operations to generate machine code, which, when executed by one or more devices, causes the devices to perform operations represented by the computation graph. In some embodiments, the machine may include one or more devices, such as a GPU and a CPU. The GPU may be controlled by a host CPU. The GPU receives various requests from host programs operated by the host CPU. For example, the GPU may receive requests at a time for launching a single data-parallel subroutine, for marking a series of points of interest in startup, or for allocating or deallocating GPU memory as dedicated memory on the GPU device.

[0032] Specifically, when the device on which the computation graph is to be executed is a GPU, compiler 103 generates a compiled artifact for processing the computation graph on the GPU. The compiled artifact includes all the descriptions necessary to perform the operations represented by the nodes of the computation graph. In some implementations, the compiled artifact includes machine code, data structures, buffer data, and library data.

[0033] Machine code is generated to process the computation graph. For example, machine code can be generated for the nodes of the computation graph, the sequence of operations in the computation graph, and the array size of the operations in the computation graph. Because this machine code is generated for a specific computation graph, it does not need to consider operations with all possible array sizes. Therefore, machine code can improve processing speed. The machine code is used by the GPU to process the computation graph. For example, the GPU can use machine code to execute specific operations represented by the computation graph. In some implementations, the machine code may not include code for the CPU. Even if the machine code does not include code for the CPU, the GPU can perform operations by calling data entities. Details of data entities will be referenced below. Figure 1B To describe in more detail.

[0034] In some implementations, the compilation artifact includes a data structure. In some implementations, the data structure may be a DAG (Directed Acyclic Graph). The data structure may represent (i) a sequence of operations represented by nodes of the computation graph and (ii) dependencies between operations.

[0035] In some implementations, compilation artifacts include buffer data. Buffer data specifies which operation is performed on which buffer when the GPU processes the computation graph by performing operations represented by the computation graph. Buffer data represents buffer parameters and the association between buffer parameters and operations. In some implementations, buffer parameters can be logical numbers. For example, corresponding logical numbers can be assigned to each buffer, and a specific operation can be performed on a specific buffer by specifying the logical number assigned to a specific buffer.

[0036] In some implementations, compilation artifacts include library data. Library data indicates which GPU library should be invoked for a specific operation. In some implementations, library data may also indicate which buffer is used for which GPU library. Like buffer data, corresponding logical numbers can be assigned to each library, and a specific GPU library can be invoked for a specific buffer by specifying the logical number assigned to that buffer.

[0037] Data entry generator 105 generates data entities from compiled artifacts. Data entities can include all descriptions necessary to perform operations represented by nodes in a computation graph. For example, all the compiled artifacts described above can be bundled together to form data entities. As a result, data entities include machine code, data structures such as DAGs, buffer data, and library data. In some embodiments, characteristics of the machine code, data structures, buffer data, and library data in the compiled artifacts can be preserved in the data entries. For example, dependencies between operations in the compiled artifacts can be preserved in the data entries. In some embodiments, data entries can be loaded into the machine's memory.

[0038] The computational graph system 100 provides data entities to one of machines 160-166 via a data communication network 114. Examples of machines 160-166 may include computing devices, personal computers, mobile devices, and servers. For example, each machine may include one or more devices 120-126, such as GPUs and CPUs. (See reference...) Figure 1A Machines 160-166 each include devices 120-126, such as GPUs. Since the data entity contains all the descriptions necessary to perform the computation, the machine receiving the data entity can enable the device—e.g., the GPU—to use the data entity to process the entire computation graph. In some implementations, because all the necessary descriptions, such as configuring the GPU and buffers, have already been determined at compile time and included in the data entity, the GPU can continue processing the entire computation graph, thereby reducing or avoiding frequent interactions between the GPU and host CPU as is typical in conventional systems. When the data entity is invoked, the device—e.g., the GPU—processes the computation graph by performing operations represented by the nodes of the computation graph and generates output. The machine including the device can return this output to the computation graph system 100, which can then return the requested output to the client 102. In some implementations, network 114 can be a local area network (LAN) or a wide area network (WAN). In some implementations, machines 160-166 may additionally include memory for storing instructions and data—e.g., random access memory (RAM)—and a processor for executing the stored instructions. Typically, each machine is a hardware resource that performs calculations independently of other machines.

[0039] Figure 1B The illustrations show example machines 160-166 and example devices 120-126 used for processing computational graphs. Figure 1BIn this example, machines 160-166 each include devices 120-126, such as GPUs. Machine 160 receives input buffer parameters 130. Input buffer parameters 130 are user-specific inputs to the computation graph. Additionally, machines 160-166 can receive data entities from the computation graph system 100. In this example, machine 160, including device 120, receives data entities from the computation graph system 100. Machine 160 invokes the data entities using the input buffer parameters, causing device 120 to process the computation graph. Since the data entities are generated from compiled artifacts that include buffer data, the data entities include buffer data. The data entities include input buffer parameters and the association between the input buffer parameters and operations. Therefore, operations from the computation graph can be identified based on the input buffer parameters. The operations are queued as a whole on device 120. While operations are queued on device 120, device 120 processes the computation graph by executing the queued operations and generates output. Machine 160 returns this output to the computation graph system 100, which can then return the output to client 102.

[0040] In some implementations, multiple machines 160-166 may each receive input buffer parameters 130 and data entities. In this example, operations are queued into each of devices 120-126, and devices 120-126 can process the computation graph simultaneously by performing operations. Devices 120-126 generate corresponding outputs, and machines 160-166 return the outputs to the computation graph system 100.

[0041] Figure 2 This is an example flowchart for generating data entities from a computation graph. In some implementations, the generation of data entities is performed in the context of the compilation task before the computation graph is processed. In some examples, the generation is performed on the host CPU, and the computation graph processing occurs on the GPU based on the generated data entities.

[0042] For convenience, process 200 is described as being executed by a system of one or more computers located at one or more locations, and appropriately programmed according to this specification. For example, appropriately programmed Figure 1A Example computation graph system 100 can execute process 200.

[0043] The system receives data (202) identifying the computation graph. The computation graph may include nodes representing operations.

[0044] The system obtains compiled artifacts (204) for processing the computation graph on the GPU. The compiled artifacts are generated by a compiler that compiles the computation graph for processing by the GPU. The compiled artifacts include descriptions necessary to perform the operations represented by the nodes of the computation graph. For example, compiled artifacts include machine code, data structures representing multiple sequences of operations and dependencies between operations, buffer data, and library data.

[0045] The system generates a data entity (206) from the compiled artifacts. When the data entity is invoked, it enables the GPU to process the computation graph by performing operations represented by multiple nodes. The data entity may include all descriptions necessary to perform the operations represented by the nodes of the computation graph. For example, all the compiled artifacts described above can be bundled together to form a data entity. As a result, the data entity includes machine code, data structures—such as DAGs, buffer data, and library data—to enable the GPU to process the computation graph by performing operations. In some implementations, the characteristics of the machine code, data structures, buffer data, and library data in the compiled artifacts may be preserved in the data entries. For example, dependencies between operations in the compiled artifacts may be preserved in the data entries. In some implementations, the data entries may be loaded into the machine's memory.

[0046] Figure 3 This is an example flowchart used to invoke data entities to process a computation graph by executing operations represented by the computation graph. For convenience, process 300 is described as being executed by one or more GPUs. For example, appropriately programmed... Figure 1B Example machines 160-166 can execute process 300.

[0047] The machine receives input buffer parameters (302). The machine may include one or more devices, such as a GPU and a CPU. The input buffer parameters are user-specific inputs to the computation graph. In some implementations, the machine may receive input buffer parameters from a user. The computation graph includes multiple nodes representing operations.

[0048] The machine receives a data entity (304). The data entity may include buffer data, which represents (i) multiple buffer parameters including input buffer parameters and (ii) the association between the multiple buffer parameters and the operation. In some implementations, the machine may receive the data entity from a computation graph system.

[0049] The machine invokes a data entity (306) using input buffer parameters. When the data entity is invoked, it causes the device—such as a GPU—to process the computation graph by performing operations based on the input buffer parameters. In some implementations, operations from the computation graph corresponding to the input parameter buffer can be identified based on the data entity. The machine queues the identified operations as a whole into the device for processing the computation graph.

[0050] Figure 4 This is an example flowchart used to invoke data entities to process a computation graph by executing operations represented by the computation graph. For convenience, process 400 is described as being executed by one or more GPUs. For example, appropriately programmed... Figure 1B Example machines 160-166 can execute process 400. A machine may include one or more devices, such as a GPU and a CPU. In this example, the operation comprises multiple sets of operations, and the data entity includes data structures representing the dependencies between operations in streams (i) and (ii).

[0051] The machine identifies flows (402) from the computation graph based on data entities. Each flow can be associated with a corresponding operation group in the computation.

[0052] For each stream, the machine queues the corresponding operations into the stream associated with that operations (404). For each stream, the machine can identify the corresponding operations based on the data entity. Once the machine has identified the corresponding operations for a stream, it queues the corresponding operations into each stream.

[0053] The machine defines dependencies between multiple operation groups based on data entities (406). A data entity includes a data structure representing the dependencies between operations. Based on these dependencies, the dependencies between multiple operation groups can be determined. In some implementations, dependencies from all streams can be constructed to a specific stream. This specific stream can be defined as the main stream.

[0054] Embodiments of the subject matter and functional operation described in this specification may be implemented using digital electronic circuits, tangibly implemented computer software or firmware, computer hardware including the structures disclosed in this specification and their structural equivalents, or a combination of one or more of these. Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier for execution by a data processing apparatus or for controlling the operation of a data processing apparatus. Alternatively or additionally, the program instructions may be encoded in artificially generated propagating signals—such as machine-generated electrical, optical, or electromagnetic signals—generated to encode information for transmission to a suitable receiver device for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these. However, the computer storage medium is not a propagating signal.

[0055] The term "data processing apparatus" encompasses all kinds of devices, apparatuses, and machines used for processing data, including, for example, programmable processors, computers, or multiple processors or computers. Apparatus may include special-purpose logic circuitry, such as FPGAs (Field-Programmable Gate Arrays) or ASICs (Application-Specific Integrated Circuits). In addition to hardware, apparatus may also include code that creates the execution environment of the computer program in discussion, such as code constituting processor firmware, protocol stacks, database management systems, operating systems, or combinations thereof.

[0056] A computer program (also referred to or described as a program, software, software application, module, software module, script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages; and it can be deployed in any form, including as a standalone program or module, component, subroutine, or other unit suitable for a computing environment. A computer program may, but does not need to, correspond to a file in a file system. A program may be stored as a part of a file that holds other programs or data—for example, one or more scripts stored in a markup language document—in a single file dedicated to the program in question, or in multiple coordinating files—for example, a file that stores portions of one or more modules, subroutines, or code. A computer program can be deployed to execute on a single computer, located in one place, or distributed across multiple locations and interconnected via a communication network.

[0057] As used herein, "engine" or "software engine" refers to a software-implemented input / output system that provides outputs distinct from its inputs. An engine can be a coded functional block, such as a library, platform, software development kit ("SDK"), or object. Each engine can be implemented on any suitable type of computing device, such as a server, mobile phone, tablet computer, laptop computer, music player, e-book reader, laptop or desktop computer, PDA, smartphone, or other fixed or portable device comprising one or more processors and computer-readable media. Furthermore, two or more engines can be implemented on the same computing device or on different computing devices.

[0058] The processes and logical flows described in this specification can be performed by one or more programmable computers that execute one or more computer programs to perform functions by processing input data and generating output. The processing and logical flows can also be executed by special-purpose logic circuitry—such as FPGAs (Field-Programmable Gate Arrays) or ASICs (Application-Specific Integrated Circuits)—and the apparatus can also be implemented as special-purpose logic circuitry.

[0059] A computer suitable for executing computer programs includes, for example, a central processing unit (CPU) that may be based on a general-purpose or special-purpose microprocessor, or both, or any other type. Typically, the CPU receives instructions and data from read-only memory or random access memory, or both. The basic components of a computer are the CPU for executing or carrying out instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include one or more mass storage devices for storing data—such as disks, magneto-optical disks, or optical disks—or operatively coupled to receive data from or transfer data to one or more mass storage devices, or both. However, a computer does not necessarily need to have these devices. Furthermore, a computer may embed another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a portable storage device—such as a universal serial bus (USB) flash drive, to name a few.

[0060] Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices such as EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; CD-ROMs and DVD-ROMs. Processors and memory may be supplemented by or incorporated into dedicated logic circuitry.

[0061] To provide interaction with the user, embodiments of the subject matter described in this specification can be implemented on a computer having: a display device for displaying information to the user—such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor—and a keyboard and pointing device—such as a mouse or trackball—through which the user can provide input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback, such as visual, auditory, or tactile feedback; input from the user in any form, including acoustic, verbal, or tactile input, can be received. Additionally, the computer can interact with the user by sending documents to and receiving documents from the device used by the user—for example, by sending web pages to a web browser in response to a request received from a web browser on the user's client device.

[0062] Embodiments of the subject matter described in this specification can be implemented in a computing system that includes back-end components, such as a data server; or middleware components, such as an application server; or front-end components, such as a client computer having a graphical user interface or web browser that a user can interact with through an implementation of the subject matter described in this specification; or any combination of one or more of these back-end, middleware, or front-end components. The components of the system can be interconnected via digital data communication of any form or medium—e.g., a communication network. Examples of communication networks include local area networks (“LANs”) and wide area networks (“WANs”), such as the Internet.

[0063] A computing system may include clients and servers. Clients and servers are typically geographically separated and usually interact through a communication network. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other.

[0064] While this specification contains numerous details of specific implementations, these should not be construed as limiting the scope of any invention or the scope of claims, but rather as descriptions of features specific to particular embodiments of a particular invention. Certain features described herein may also be implemented in combination in a single embodiment, even in the context of separate embodiments. Conversely, various features described in the context of a single embodiment may also be implemented separately in multiple embodiments or in any suitable sub-combination. Furthermore, while features may be described above as functioning in certain combinations and even initially claimed in this way, one or more features in a claimed combination may be removed from the combination in certain circumstances, and the claimed combination may involve sub-combinations or variations thereof.

[0065] Similarly, although the operations are depicted in a specific order in the accompanying drawings, this should not be construed as requiring these operations to be performed in the specific order shown or in sequential order, or to perform all illustrated operations to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of the various system modules and components in the above embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems may generally be integrated together in a single software product or packaged in multiple software products.

[0066] Other implementation methods are summarized in the following examples:

[0067] Example 1: A method comprising: receiving data identifying a computation graph, the computation graph including a plurality of nodes representing operations; obtaining a compilation artifact for processing the computation graph on a processing unit; and generating a data entity from the compilation artifact, wherein the data entity, when invoked, causes the processing unit to process the computation graph by performing the operations represented by the plurality of nodes.

[0068] Example 2: The method as described in Example 1, wherein the compilation artifact further includes: buffer data representing a plurality of buffer parameters and the association between the plurality of buffer parameters and the operation, each of the plurality of buffer parameters being associated with a corresponding operation in the operation.

[0069] Example 3: The method as described in Example 2, wherein the compilation artifact further includes: a corresponding logical number assigned to the plurality of buffer parameters, and data specifying the association between the logical number and the operation represented by the plurality of nodes.

[0070] Example 4: The method as described in any one of Examples 1 to 3, wherein the compilation artifact further includes: a data structure representing (i) the sequence of operations and (ii) the dependencies between the operations.

[0071] Example 5: The method described in Example 4, wherein the data structure is a directed acyclic graph.

[0072] Example 6: The method as described in any one of Examples 1 to 5 further includes: maintaining a plurality of libraries, each of which stores one or more subroutines.

[0073] Example 7: The method as described in Example 6, wherein the compilation artifact further includes: library data, the library data representing a plurality of buffer parameters and associations between the plurality of buffer parameters and the plurality of libraries, each of the plurality of buffer parameters being associated with a corresponding library among the plurality of libraries.

[0074] Example 8: The method as described in any one of Examples 1 to 7, wherein the compiled artifact further comprises: machine code configured to process the computation graph.

[0075] Example 9: The method as described in any one of Examples 1 to 8, wherein the operation is an operation for processing the input of the neural network through one or more layers of the neural network to generate the output of the neural network.

[0076] Example 10: The method as described in any one of Examples 1 to 9, wherein the operation is an operation used to train the neural network by adjusting the parameter values ​​of the neural network.

[0077] Example 11: The method as described in any one of Examples 1 to 10, wherein the processing unit is a graphics processing unit (GPU) or a central processing unit (CPU).

[0078] Example 12: A method comprising: receiving input buffer parameters as user-specific input to a computation graph, the computation graph including a plurality of nodes representing operations; receiving a data entity including buffer data representing (i) a plurality of buffer parameters including the input buffer parameters and (ii) the association between the plurality of buffer parameters and the operations; and invoking the data entity using the input buffer parameters such that a processing unit processes the computation graph according to the input buffer parameters by performing the operations.

[0079] Example 13: The method as described in Example 12, wherein invoking the data entity using the input buffer parameter includes: identifying one or more operations from the computation graph based on the data entity, the one or more operations corresponding to the input buffer parameter, and queuing the one or more operations on the processing unit to process the computation graph.

[0080] Example 14: The method as described in Example 12 or 13, wherein the operation comprises a plurality of operation groups, and the data entity comprises a data structure representing (i) a plurality of streams, each stream representing a sequence of operations, and (ii) dependencies between the operations, and wherein invoking the data entity using the input buffer parameters comprises: identifying a plurality of streams from the computation graph based on the data entity, each stream being associated with a corresponding operation group from the operation; pairing the corresponding operation group to the stream associated with the corresponding operation group for each of the plurality of streams; and defining the dependencies between the plurality of operation groups based on the data entity.

[0081] Example 15: The method as described in any one of Examples 12 to 14, wherein the processing unit is a GPU or a CPU.

[0082] Example 16: A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform system operations, the system operations including: receiving data identifying a computation graph, the computation graph including a plurality of nodes representing operations; obtaining a compiled artifact for processing the computation graph on a processing unit; and generating a data entity from the compiled artifact, wherein the data entity, when invoked, causes the processing unit to process the computation graph by performing the operations represented by the plurality of nodes.

[0083] Example 17: A system as described in Example 16, wherein the compilation artifact further includes: buffer data representing a plurality of buffer parameters and the association between the plurality of buffer parameters and the operation, each of the plurality of buffer parameters being associated with a corresponding operation in the operation.

[0084] Example 18: The system as described in Example 17, wherein the compilation artifact further includes: a corresponding logical number assigned to the plurality of buffer parameters, and data specifying the association between the logical number and the operation represented by the plurality of nodes.

[0085] Example 19: A system as described in any one of Examples 16 to 18, wherein the compilation artifact further includes: a data structure representing the dependencies between (i) the sequence of operations and (ii) the operations.

[0086] Example 20: A system as described in Example 19, wherein the data structure is a directed acyclic graph.

[0087] Example 21: The system described in any one of Examples 16 to 20 further includes: maintaining a plurality of libraries, each of which stores one or more subroutines.

[0088] Example 22: A system as described in Example 21, wherein the compilation artifact further includes: library data representing a plurality of buffer parameters and associations between the plurality of buffer parameters and the plurality of libraries, each of the plurality of buffer parameters being associated with a corresponding library among the plurality of libraries.

[0089] Example 23: A system as described in any one of Examples 16 to 22, wherein the compiled artifact further includes machine code configured to process the computation graph.

[0090] Example 24: A system as described in any one of Examples 16 to 23, wherein the operation is an operation for processing the input of the neural network through one or more layers of the neural network to generate the output of the neural network.

[0091] Example 25: A system as described in any one of Examples 16 to 24, wherein the operation is an operation for training the neural network by adjusting the parameter values ​​of the neural network.

[0092] Example 26: A system as described in any one of Examples 16 to 25, wherein the processing unit is a GPU or a CPU.

[0093] Example 27. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform system operations, the system operations comprising: receiving input buffer parameters as user-specified input to a computation graph, the computation graph including a plurality of nodes representing operations; receiving a data entity including a data entity representing (i) a plurality of buffer parameters including the input buffer parameters and (ii) buffer data relating the plurality of buffer parameters to the operations; and invoking the data entity using the input buffer parameters such that a processing unit processes the computation graph according to the input buffer parameters by performing the operations.

[0094] Example 28: The system as described in Example 27, wherein invoking the data entity using the input buffer parameter includes: identifying one or more operations from the computation graph based on the data entity, the one or more operations corresponding to the input buffer parameter, and queuing the one or more operations on the processing unit to process the computation graph.

[0095] Example 29: A system as in Example 27 or 28, wherein the operation comprises multiple operation groups and the data entity comprises a data structure representing: (i) multiple streams, each stream representing a sequence of operations, and (ii) dependencies between the operations, and wherein invoking the data entity using the input buffer parameters comprises: identifying multiple streams from the computation graph based on the data entity, each stream being associated with a corresponding operation group from the operation; queuing the corresponding operation group into the stream associated with the corresponding operation group for each of the multiple streams; and defining the dependencies between the multiple operation groups based on the data entity.

[0096] Example 30: A system as in one of Examples 27 to 29, wherein the processing unit is a GPU or a CPU.

[0097] Specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. For example, the actions set forth in the claims can be performed in a different order and still achieve the desired result. As an example, the processes depicted in the figures do not necessarily require the specific order or sequence shown to achieve the desired result. In some cases, multitasking and parallel processing can be advantageous.

Claims

1. A method for compiling machine code for one or more processing units controlled by one or more central processing units, the method comprising: Receive computation graph and input buffer parameters, wherein the computation graph includes multiple nodes representing the corresponding operations; Generate a compilation artifact, wherein the compilation artifact includes machine code for execution by the one or more processing units to perform the corresponding operations represented by the plurality of nodes in the computation graph, wherein the compilation artifact includes the input buffer parameters, and wherein the compilation artifact excludes any machine code for execution by the one or more central processing units; A data entity is generated from the compiled artifact. When the data entity is invoked at runtime by the one or more processing units, the one or more processing units process the computation graph according to the input buffer parameters and do not execute any machine code executed by the one or more central processing units. The processing by the one or more processing units includes: The corresponding operations are queued in the one or more processing units. Based on the data entity, define the dependencies between the corresponding operations, and The corresponding queued operations are performed by the one or more processing units according to the dependencies, without executing any machine code executed by the one or more central processing units.

2. The method as described in claim 1, wherein, The one or more processing units include a graphics processing unit (GPU).

3. The method as described in claim 1, wherein, The data entity includes machine code that, when executed by the one or more processing units, causes the one or more processing units to invoke the data entity to process the computation graph according to the input buffer parameters.

4. The method of claim 1, wherein, The corresponding operations are queued in a specific order, and the processing includes: Generate instructions that, when executed by the one or more processing units, cause the one or more processing units to perform the corresponding operations in the specific order.

5. The method of claim 4, wherein, Queuing the corresponding operations in one or more processing units includes: Generate instructions that, when executed by the one or more processing units, cause the one or more processing units to queue corresponding subsets of the corresponding operations at each of the one or more processing units; and Generate a corresponding subset that, when executed by the one or more processing units, causes the one or more processing units to perform the corresponding operation for that processing unit at each processing unit.

6. The method of claim 1, wherein, Each of the one or more processing units includes a corresponding buffer, and the method further includes: The one or more processing units assign the first operation of the corresponding operation to the buffer of a specific processing unit among the one or more processing units for execution based on the buffer data.

7. The method of claim 1, wherein, The input buffer parameters include: Multiple buffer parameters, including the input buffer parameters and the association between the multiple buffer parameters and the corresponding operation, wherein each of the multiple buffer parameters is associated with a corresponding operation in the corresponding operation.

8. The method of claim 1, wherein, The compiled artifacts include: A data structure representing (i) the sequence of operations corresponding to the corresponding operations represented by the computation graph and (ii) the dependencies between the corresponding operations.

9. The method of claim 1, wherein, The compiled artifacts include: Library data, which represents multiple buffer parameters and the association between the multiple buffer parameters and multiple libraries, the multiple libraries including one or more subroutines, each of the multiple buffer parameters being associated with a corresponding library in the multiple libraries.

10. The method of claim 1, wherein, The corresponding operation is an operation used to process the input of the neural network through one or more layers of the neural network to generate the output of the neural network.

11. A system for compiling machine code for one or more processing units controlled by one or more central processing units, the system comprising: One or more computers and one or more storage devices, the storage devices storing instructions that, when executed by the one or more computers, are operable to cause the one or more computers to perform a first operation for compiling machine code for one or more processing units, including: Receive computation graph and input buffer parameters, wherein the computation graph includes multiple nodes representing the corresponding operations; Generate a compilation artifact, wherein the compilation artifact includes machine code for execution by the one or more processing units to perform the corresponding operations represented by the plurality of nodes in the computation graph, wherein the compilation artifact includes the input buffer parameters, and wherein the compilation artifact excludes any machine code for execution by the one or more central processing units configured to control the one or more processing units; A data entity is generated from the compiled artifact. When the data entity is invoked at runtime by the one or more processing units, the one or more processing units process the computation graph according to the input buffer parameters and do not execute any machine code executed by the one or more central processing units. The processing by the one or more processing units includes: The corresponding operations are queued in the one or more processing units. Based on the data entity, define the dependencies between the corresponding operations, and The corresponding queued operations are performed by the one or more processing units according to the dependencies, without executing any machine code executed by the one or more central processing units.

12. The system of claim 11, wherein, The one or more processing units include a graphics processing unit (GPU).

13. The system of claim 11, wherein, The data entity includes machine code that, when executed by the one or more processing units, causes the one or more processing units to invoke the data entity to process the computation graph according to the input buffer parameters.

14. The system of claim 11, wherein, The corresponding operations are queued in a specific order, and the processing further includes: Generate instructions that, when executed by the one or more processing units, cause the one or more processing units to perform the corresponding operations in the specific order.

15. The system of claim 14, wherein, Queuing the corresponding operations in one or more processing units includes: Generate instructions that, when executed by the one or more processing units, cause the one or more processing units to queue corresponding subsets of the corresponding operations at each of the one or more processing units; and Generate a corresponding subset that, when executed by the one or more processing units, causes the one or more processing units to perform the corresponding operation for that processing unit at each processing unit.

16. The system of claim 11, wherein, Each of the one or more processing units includes a corresponding buffer, and the first operation further includes: The one or more processing units assign the first and second operations in the corresponding operations to the buffer of a specific processing unit among the one or more processing units for execution based on the buffer data.

17. The system of claim 11, wherein, The input buffer parameters include: Multiple buffer parameters, including the input buffer parameters and the association between the multiple buffer parameters and the corresponding operation, each of the multiple buffer parameters being associated with a corresponding second operation in the corresponding operation.

18. The system of claim 11, wherein, The compiled artifacts include: Library data, which represents multiple buffer parameters and the association between the multiple buffer parameters and multiple libraries, the multiple libraries including one or more subroutines, each of the multiple buffer parameters being associated with a corresponding library in the multiple libraries.

19. The system of claim 11, wherein, The corresponding operation is an operation used to process the input of the neural network through one or more layers of the neural network to generate the output of the neural network.

20. A plurality of non-transitory computer-readable storage media encoded with instructions, said instructions, when executed by the plurality of computers, causing the plurality of computers to perform a first operation for compiling machine code for compiling one or more processing units, the first operation comprising: Receive computation graph and input buffer parameters, wherein the computation graph includes multiple nodes representing the corresponding operations; Generate a compilation artifact, wherein the compilation artifact includes machine code for execution by the one or more processing units to perform the corresponding operations represented by the plurality of nodes in the computation graph, wherein the compilation artifact includes the input buffer parameters, and wherein the compilation artifact excludes any machine code for execution by the one or more central processing units configured to control the one or more processing units. A data entity is generated from the compiled artifact. When the data entity is invoked by the one or more processing units at runtime, the one or more processing units process the computation graph by performing the corresponding operations according to the input buffer parameters, without executing any machine code executed by the one or more central processing units. The processing by the one or more processing units includes: The corresponding operations are queued in the one or more processing units. Based on the data entity, define the dependencies between the corresponding operations, and The corresponding queued operations are performed by the one or more processing units according to the dependencies, without executing any machine code executed by the one or more central processing units.