Operator operation methods, apparatus and related products

By generating target operators within a deep learning framework and determining their operating modes based on preset strategies, the problem of IO consumption caused by operator switching and invocation is solved, achieving unified support for multi-mode operators and reducing the difficulty of maintenance and debugging.

CN115374915BActive Publication Date: 2026-06-30CAMBRIAN (KUNSHAN) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CAMBRIAN (KUNSHAN) INFORMATION TECH CO LTD
Filing Date
2021-05-19
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, operator switching and invocation in neural networks result in high I/O consumption, increasing maintenance costs and debugging difficulty.

Method used

After generating the target operator in the deep learning framework, its operating mode is determined by a preset detection strategy, and the corresponding operating mode is executed according to the mode, such as layer-by-layer or fusion mode, which reduces the need for separate encapsulation of operators of different modes.

Benefits of technology

While ensuring the running results, it reduces the maintenance cost and debugging difficulty at the deep learning framework level, and achieves unified support for multi-mode operators.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115374915B_ABST
    Figure CN115374915B_ABST
Patent Text Reader

Abstract

This disclosure relates to a method, apparatus, and related products for operator execution. According to instructions, a target operator is generated within a deep learning framework. Then, based on a preset detection strategy, the operating mode of the target operator is determined. If the operating mode of the target operator is a first operating mode, the target operator is executed in the first operating mode; if the operating mode of the target operator is a second operating mode, the target operator is executed in the second operating mode. This method enables operators supporting multiple operating modes to be implemented simultaneously by encapsulating only one set of operators at the deep learning framework level. While ensuring the operating results, it greatly reduces maintenance costs and debugging difficulty at the framework level.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of neural network technology, and in particular to a method, apparatus and related products for operating an operator. Background Technology

[0002] With the development of artificial intelligence technology, neural networks have been widely used in various fields. For example, convolutional neural networks are used for image recognition in image processing tasks.

[0003] Neural networks involve various operators, such as convolution operators, pooling operators, sampling operators, and so on. Switching between these operators and their mutual calls can generate significant I / O overhead. Therefore, to reduce I / O overhead and lower latency in neural networks, some hardware provides corresponding underlying mechanisms and interfaces.

[0004] However, these hardware components often provide interfaces for operators in different modes, which greatly increases maintenance costs and debugging difficulty at the framework level. Summary of the Invention

[0005] Therefore, it is necessary to provide a method, apparatus, and related products for operator operation that can reduce maintenance costs and debugging difficulties at the framework level to address the aforementioned technical problems.

[0006] In a first aspect, embodiments of this disclosure provide a method for operating an operator, the method comprising:

[0007] Based on the instructions, the target operator is generated in the deep learning framework;

[0008] Based on the preset detection strategy, the operating mode of the target operator is determined;

[0009] If the target operator's operating mode is the first operating mode, the target operator is executed in the first operating mode; if the target operator's operating mode is the second operating mode, the target operator is executed in the second operating mode.

[0010] Secondly, embodiments of this disclosure provide an apparatus for operating an operator, the apparatus comprising:

[0011] The generation module is used to generate target operators in the deep learning framework according to instructions;

[0012] The determination module is used to determine the operating mode of the target operator based on a preset detection strategy;

[0013] The execution module is used to execute the target operator in the first execution mode if the target operator's execution mode is the first execution mode, and to execute the target operator in the second execution mode if the target operator's execution mode is the second execution mode.

[0014] Thirdly, this disclosure provides a data processing apparatus, including a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the steps described in the first aspect embodiment.

[0015] Fourthly, this disclosure provides a combined processing apparatus, which includes a data processing apparatus, a general interconnect interface, and other processing apparatuses besides the data processing apparatus described in the third aspect above; the data processing apparatus interacts with the other processing apparatuses.

[0016] Fifthly, this disclosure provides a chip that includes the combined processing apparatus described in the fourth aspect above.

[0017] Sixthly, this disclosure provides a board card, which includes the chip described in the fifth aspect of the embodiments above.

[0018] In a seventh aspect, this disclosure provides an electronic device, which includes the board in the sixth aspect embodiment described above.

[0019] Eighthly, this disclosure provides an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of the method described in the first aspect.

[0020] Ninthly, embodiments of this disclosure provide a storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method described in the first aspect.

[0021] The method, apparatus, and related products for operator execution provided in this disclosure embodiment generate a target operator in a deep learning framework according to instructions. Then, based on a preset detection strategy, the operating mode of the target operator is determined. If the operating mode of the target operator is a first operating mode, the target operator is executed in the first operating mode; if the operating mode of the target operator is a second operating mode, the target operator is executed in the second operating mode. In this method, by determining the operating mode of the operator based on a preset detection strategy before executing the operator, it is equivalent to setting up an operator operating mode detection step. The operating mode of the target operator can be determined through this operator operating mode detection step, and then the operator is executed in the corresponding operating mode. In this way, it is not necessary to separately encapsulate operators of different modes in the deep learning framework. This allows the deep learning framework level to only encapsulate one set of operators to achieve operators that simultaneously support multiple operating modes. While ensuring the running results, it greatly reduces maintenance costs and debugging difficulty at the framework level. Attached Figure Description

[0022] Figure 1aThis is a schematic diagram illustrating the different operating modes of operators in the PyTorch framework in one embodiment.

[0023] Figure 1b This is an application environment diagram of the operator execution method in one embodiment;

[0024] Figure 2a This is a flowchart illustrating the method of operator execution in one embodiment;

[0025] Figure 2b This is a flowchart illustrating different operator modes in the PyTorch framework in another embodiment;

[0026] Figure 3 This is a flowchart illustrating the method of operator execution in another embodiment;

[0027] Figure 4 This is a flowchart illustrating the method of operator execution in another embodiment;

[0028] Figure 5 This is a flowchart illustrating the method of operator execution in another embodiment;

[0029] Figure 6 This is a flowchart illustrating the method of operator execution in another embodiment;

[0030] Figure 7 This is a structural block diagram of the apparatus for operator operation in one embodiment;

[0031] Figure 8 A structural block diagram of the apparatus for operator operation in another embodiment;

[0032] Figure 9 A structural block diagram of the apparatus for operator operation in another embodiment;

[0033] Figure 10 This is a structural diagram of the combined processing device in one embodiment;

[0034] Figure 11 This is a schematic diagram of the board structure in one embodiment. Detailed Implementation

[0035] The technical solutions in the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this disclosure, not all of them. Based on the embodiments in this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.

[0036] It should be understood that the terms "first," "second," etc., in the claims, specification, and drawings of this disclosure are used to distinguish different objects, not to describe a specific order. The term "comprising" as used in the specification and claims of this disclosure indicates the presence of a described feature, integral, step, operation, element, and / or component, but does not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or a collection thereof.

[0037] It should also be understood that the terminology used in this disclosure is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. As used in this disclosure and claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in this disclosure and claims means any combination and all possible combinations of one or more of the associated listed items, and includes such combinations. As used in this disclosure and claims, the term “if” may be interpreted, depending on the context, as “when…” or “once” or “in response to determination” or “in response to detection.” Similarly, the phrase “if determined” or “if detected [the described condition or event]” may be interpreted, depending on the context, as meaning “once determined” or “in response to determination” or “once detected [the described condition or event]” or “in response to detection [the described condition or event].”

[0038] First, before detailing the technical solutions of the embodiments disclosed herein, the technical background or evolution of the embodiments based on this disclosure will be introduced. Taking an artificial intelligence processor as an example, the operator API supports two operating modes: layer-by-layer and fusion. The layer-by-layer operating mode refers to each operator being compiled individually after generation (i.e., each operator needs to be compiled once) before execution. The fusion operating mode refers to multiple generated operators being compiled together after generation (i.e., all operators are compiled only once) before execution. Accordingly, operators running in layer-by-layer mode are called layer-by-layer operators, and operators running in fusion mode are called fusion operators. Due to the different mechanisms of fusion mode and layer-by-layer mode, the usage of operators in layer-by-layer mode and fusion mode is also different. For the usage flow of operators in layer-by-layer mode and fusion mode within the PyTorch neural network framework, please refer to [link to relevant documentation]. Figure 1a As shown. Figure 1a In this architecture, layer-by-layer operators are initiated through the dispatch system, while fusion operators are initiated from the Just-In-Time (JIT) compilation system to the fusion operator dispatch system (FusedKernel). Figure 1aAs can be seen, to support both layer-by-layer and fusion modes, two separate sets of operators need to be encapsulated at the framework level, which greatly increases maintenance costs and debugging difficulty. Furthermore, it should be noted that the applicant has devoted considerable creative effort to identifying the technical problems and developing the technical solutions described in the following embodiments. Moreover, to address this deficiency, the operator operation method disclosed herein can simultaneously support both layer-by-layer and fusion modes within a single operator set.

[0039] This disclosure provides an operator operation method that can be applied to, for example... Figure 1b In the application environment shown, the application environment includes a computer device 01, which can be of any type, such as various personal computers, laptops, smartphones, tablets, and portable wearable devices, or a standalone server or a server cluster composed of multiple servers. The internal structure of the computer device includes a processor 011, a non-volatile storage medium 012, internal memory 013, and a network interface 014. The processor 011 provides the computational and control capabilities for executing the operator execution method. The processor 011 can be any type of processor, including but not limited to machine learning processors, artificial intelligence processors (IPUs), central processing units (CPUs), and graphics processing units (GPUs) forming a heterogeneous processor. This processor can be installed in any type of computer device. The non-volatile storage medium 012 stores an operating system 0121, computer programs 0122, and a database 0123. The internal memory provides an environment for the operation of the operating system 0121 and computer programs 0122 in the non-volatile storage medium 012, and the database 0123 stores relevant data related to the operator execution method process. This network interface is used to communicate with other external devices via a network connection.

[0040] To make the purpose, technical solution, and advantages of this disclosure clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of this disclosure and are not intended to limit it. The technical solution of this disclosure and how it solves the aforementioned technical problems will be described in detail below through embodiments and in conjunction with the accompanying drawings. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. It should be noted that the operator execution method provided in this disclosure is executed by a computer device or an artificial intelligence processor. The execution subject of this method can also be a device for operator execution, which can be implemented as part or all of the computer device or artificial intelligence processor through software, hardware, or a combination of software and hardware.

[0041] Figure 2aA method for executing an operator is provided. This method involves, after generating a target operator, detecting the corresponding execution mode of the target operator, and then executing the target operator according to the corresponding execution mode. For example... Figure 2a As shown, the method includes:

[0042] S101, according to the instructions, generates the target operator in the deep learning framework.

[0043] The instruction refers to the instruction for generating the target operator. For example, after receiving the instruction to generate the target operator, the computer device generates the target operator according to the instruction. The computer device can receive the instruction by directly receiving user input, including but not limited to voice input and input via peripheral devices. Alternatively, the computer device can actively retrieve a generation file from a database and parse the instruction from the generation file when triggering conditions are met. Other methods exist, but this disclosure does not limit the specific methods used. For example, please refer to [link to relevant documentation]. Figure 2b As shown, for a computer device, receiving an operator call instruction dispatched by the distribution system / fusion operator distribution system is equivalent to receiving the above instruction. That is, at this time, the computer device begins to execute the operation of generating the target operator.

[0044] After receiving instructions, the computer device begins generating target operators within a deep learning framework. This deep learning framework can be PyTorch, the Python version of Torch, a neural network framework for programming GPU-accelerated deep neural networks (DNNs). The deep learning framework is the first layer in the entire deep learning ecosystem, concretizing the deep learning tasks expressed by the computational graph structure of the neural network into instructions and data that can be executed on a CPU or AI processor. In this process, the deep learning framework uses operators as the concrete elements to implement the computational tasks, providing a kernel function for each operator that executes on the CPU or AI processor. Based on the computational graph, the deep learning framework schedules and executes the kernel function corresponding to each operator in the computational graph, completing the computation of the entire neural network. Essentially, it further breaks down neural network computation into various common operators for tensor data, and implements the functionality of the neural network by executing the kernel functions corresponding to each operator.

[0045] Operators in deep learning frameworks need to be pre-generated to execute correctly when used by the neural network. In deep learning frameworks, generating a target operator refers to any operator to be generated. In practical applications, the computer device can be the source code file for executing the target operator, thus obtaining the generated target operator.

[0046] S102, based on the preset detection strategy, determine the operating mode of the target operator.

[0047] Please continue reading Figure 2b As shown, the operator invocation instructions dispatched by the distribution system / fusion operator distribution system are triggered by the user when they need to invoke the deep learning network. While the user has already determined the operating mode of the operators in the deep learning network when triggering the invocation instruction, the computer device cannot determine the operating mode of the target operator at this time. Therefore, after generating the target operator but before officially executing it, the computer device needs to determine the operating mode of the target operator. The operating mode can be understood as the way the operator's program runs; different operating modes result in different operator execution methods. In practical applications, the operator's operating modes include, but are not limited to, delayed operating mode, triggered operating mode, composite operating mode, layer-by-layer operating mode, and fusion operating mode.

[0048] To further illustrate the operator's operating modes, we will use the layer-by-layer operating mode and the fusion operating mode as examples to explain the specific operating modes of operators. The mechanisms of the layer-by-layer mode and the fusion mode are different, and the methods of using operators are also different. In the layer-by-layer mode, the operator usage is: CreateOp (generate operator) -> CompileOp (compile operator) -> ComputeOp (execute each compiled operator); in the fusion mode, the operator usage is: CreateOp (generate operator) -> FuseOp (fuse compiled operator) -> ComputeOp (execute the fused compiled operator). That is, the layer-by-layer operating mode means that each operator is compiled separately after it is generated (i.e., each operator needs to be compiled once) and then executed, while the fusion operating mode means that after each operator is generated, multiple generated operators are compiled together (i.e., all operators are compiled only once) and then executed.

[0049] Computer devices can determine the operating mode of a target operator based on a preset detection strategy. Optionally, this preset detection strategy is a strategy determined based on the call stack rules of operators in different operating modes. That is, the detection strategy is a pre-established strategy used to determine the operating mode of the target operator. Specifically, in deep learning frameworks, such as the PyTorch framework, the call stacks of operators in layer-by-layer and fusion modes are completely different. In the PyTorch framework, the call of an operator in fusion mode must be initiated from the fusion operator dispatch system (FusedKernel). However, the dispatch system may initiate not only layer-by-layer operators but also operators in other modes. Therefore, a detection strategy is set based on this. For example, a global variable (RunningMode) can be defined, and RunningMode can be set to fusion when entering FusedKernel and set to layer-by-layer when FusedKernel ends. Then, the computer device can determine the operating mode of the target operator based on this, thus ensuring that the correct RunningMode is accessed in the underlying independent kernel, thereby accurately determining the operating mode of the target operator. In addition to this detection strategy, other methods can be used to detect and determine the operating mode of the target operator. This disclosed embodiment does not limit the determination of the operating mode of the target operator.

[0050] S103, if the target operator's operating mode is the first operating mode, execute the target operator in the first operating mode; if the target operator's operating mode is the second operating mode, execute the target operator in the second operating mode.

[0051] After the computer device generates the target operator and determines its operating mode, it executes the target operator according to the determined operating mode. Specifically, if the target operator's operating mode is the first operating mode, the target operator is executed according to the first operating mode; if the target operator's operating mode is the second operating mode, the target operator is executed according to the second operating mode.

[0052] Optionally, the first running mode is a layer-by-layer running mode, and the second running mode is a fusion running mode. If the target operator's running mode is the first running mode, the target operator is executed in the layer-by-layer running mode, specifically, the target operator is compiled separately. If the target operator's running mode is the second running mode, the target operator is executed in the fusion running mode, specifically, the target operator is fused and compiled with other generated operators.

[0053] The operator execution method, apparatus, and related products provided in this disclosure embodiment generate a target operator in a deep learning framework according to a generation instruction. Then, based on a preset detection strategy, the execution mode of the target operator is determined. If the execution mode of the target operator is a first execution mode, the target operator is executed in the first execution mode; if the execution mode of the target operator is a second execution mode, the target operator is executed in the second execution mode. In this method, by determining the execution mode of the operator based on a preset detection strategy before execution, it is equivalent to setting up an operator execution mode detection step. The execution mode of the target operator can be determined through this step, and then the operator is executed in the corresponding execution mode. In this way, it is not necessary to separately encapsulate operators of different modes in the deep learning framework. This allows the deep learning framework to encapsulate only one set of operators to achieve operators that simultaneously support multiple execution modes. While ensuring the execution results, it greatly reduces maintenance costs and debugging difficulty at the framework level.

[0054] Based on the above embodiments, the process of determining the operating mode of the target operator based on a preset detection strategy will be described in detail below, such as... Figure 3 As shown, this embodiment includes the following steps:

[0055] S201, Detect whether a preset global flag exists in the fusion distribution system.

[0056] In this embodiment, the global flag refers to a flag that each subsystem in the deep learning network can recognize, allowing the identification of the target operator's operating mode. Optionally, this preset global flag is set at a specific time based on the PyTorch mechanism. This specific time is determined within the kernel using the PyTorch mechanism, specifically when the RunningMode is set via a global variable. Specifically, because the layer-by-layer mode and the fusion mode operate differently, operator calls in fusion mode are always initiated from the FusedKernel (the fusion operator distribution system). The global flag RunningMode is set to fusion upon entering the FusedKernel and layer-by-layer upon exiting the FusedKernel, ensuring that the correct RunningMode is accessed in the underlying independent kernel, thus enabling a single operator kernel to support both layer-by-layer and fusion modes. Figure 2b As shown, this global flag is pre-set so that as long as it is from... Figure 2b The invocation instruction dispatched by the fusion operator distribution system requires operators that need to run in fusion mode; these are fusion operators. However, if it is not from... Figure 2bThe call instruction dispatched by the integrated operator distribution system requires operators that need to be run in a hierarchical mode, i.e., hierarchical operators. The global flag can be a specific numerical value; however, it can also be set to a letter, a combination of a number and a letter, etc., and this disclosed embodiment does not limit this.

[0057] S202, if a global flag exists in the fusion distribution system, the target operator's operating mode is determined to be the first operating mode; if no global flag exists in the fusion distribution system, the target operator's operating mode is determined to be the second operating mode.

[0058] Optionally, taking the first operating mode as a fusion operating mode and the second operating mode as a layer-by-layer operating mode as an example, after the computer device generates the target operator, it checks whether a global flag bit exists in the fusion distribution system (i.e., the fusion operator distribution system). If it exists, the operating mode of the target operator is determined to be the first operating mode (fusion operating mode). If it does not exist, the operating mode of the target operator is determined to be the second operating mode (layer-by-layer operating mode).

[0059] In this disclosed embodiment, the operation mode of the operator is distinguished by setting a global flag. It is only necessary to determine whether there is a preset global flag in the fusion distribution system to determine the operation mode of the target operator. Then, the target operator is executed in the corresponding operation mode. In this way, there is no need to separately encapsulate operators of different modes in the deep learning framework. This allows the deep learning framework to encapsulate only one set of operators to achieve operators that support multiple operation modes at the same time, thereby greatly reducing maintenance costs and debugging difficulty at the framework level.

[0060] After determining the operating mode of the target operator, the target operator is executed using the determined operating mode. As mentioned earlier, the usage of layer-by-layer and fusion-mode operators is as follows: layer-by-layer operators are executed following the normal process of generating the operator, compiling it individually, and then executing it; however, fusion operators, after generating the operator, merge and compile all generated operators before execution. Therefore, we will explain the execution of the target operator using the first operating mode as the fusion operating mode and the second operating mode as the layer-by-layer operating mode as an example. In one embodiment, such as... Figure 4 As shown, if the operating mode is a layer-by-layer operating mode, then the above S103 includes the following steps:

[0061] S301, compile the target operator to obtain the executable file of the target operator.

[0062] S302 runs the executable file of the target operator on the artificial intelligence processor.

[0063] When executing a target operator, it needs to be compiled into an executable file, and then the resulting executable file is run on an AI processor to complete the target operator operation. This embodiment uses a layer-by-layer execution mode for the target operator, so it can be directly compiled and run. For example, suppose the target operators include three: operator A, operator B, and operator C. The operational logic between these three operators is that the output of operator A serves as all the inputs of operator B, the outputs of operator B and operator A together serve as the inputs of operator C, and the output of operator C is the final result. Therefore, executing these three operators in a layer-by-layer execution mode involves first executing operator A, obtaining its output, and storing it in memory. When executing operator B, the output of operator A is retrieved from memory as its input, and similarly, the output of operator B is also stored in memory. When executing operator C, the outputs of operators A and B are retrieved from memory as its input.

[0064] In another embodiment, such as Figure 5 As shown, if the operating mode is the merged operating mode, then the above S103 includes the following steps:

[0065] S401, the target operator is sent to the fusion unit.

[0066] S402, in the fusion unit, the target operator and other operators in the fusion unit are compiled simultaneously, and the compiled executable file is run on the artificial intelligence processor.

[0067] In fusion mode, the target operator needs to be sent to the fusion unit (or fusion container), then compiled as a whole by the fusion unit. The compiled executable file is then run on the AI ​​processor to complete the operation on the target operator. Taking operators A, B, and C as an example, running these three operators in fusion mode involves placing them in the same container. When running these three operators in this container, the output of operator A does not need to be stored in the container; it is directly given to operators B and C as input. Similarly, the output of operator B is also directly given to operator C as input. Thus, when executing operators in fusion mode, there is no need for memory storage, thereby saving operational resources.

[0068] In this disclosed embodiment, the target operator is executed according to the running mode corresponding to the target operator. Different running modes have different running methods, which ensures the accuracy of the execution of the target operator.

[0069] like Figure 6 As shown, this disclosure also provides a multi-mode operation method, which includes the following steps:

[0070] S1 generates the target operator within the deep learning framework according to the instructions.

[0071] S2, detect whether a preset global flag exists in the fusion distribution system.

[0072] S3. If a global flag exists in the fusion distribution system, determine that the target operator is in fusion operation mode.

[0073] S4, send the target operator to the fusion unit.

[0074] S5 compiles the target operator and other operators in the fusion unit simultaneously, and runs the compiled executable file on the artificial intelligence processor.

[0075] S6. If there is no global flag in the fusion distribution system, the target operator is determined to be in layer-by-layer operation mode.

[0076] S7 compiles the target operator to obtain the executable file of the target operator.

[0077] S8 runs the executable file of the target operator on the artificial intelligence processor.

[0078] Specifically, in this embodiment, the organization and invocation process of the framework-level operators is as follows:

[0079]

[0080] To enable a single operator to simultaneously support both layer-by-layer and fusion modes, this disclosed embodiment adds a CheckFuse (operator execution mode detection) step to the organization and invocation of operators at the framework level. This CheckFuse step detects whether the target operator's execution mode is fusion mode. If it is fusion mode, FuseOp (compiling and executing the compiled operator) is run; if it is layer-by-layer mode, CompileOp (compiling the separately generated operator) -> ComputeOp (executing the separately compiled operator) is executed. By organizing operators using the strategy disclosed herein, only one set of operators can be used to simultaneously support both layer-by-layer and fusion modes, significantly reducing maintenance costs and debugging difficulty while ensuring optimal results.

[0081] It should be understood that although the steps in the flowchart above are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowchart above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages in other steps.

[0082] In one embodiment, such as Figure 7 As shown, an apparatus for operating an operator is provided, the apparatus comprising: a generation module 10, a determination module 11, and an operation module 12, wherein:

[0083] The generation module 10 is used to generate target operators in the neural network framework according to instructions;

[0084] The determination module 11 is used to determine the operating mode of the target operator based on a preset detection strategy;

[0085] The running module 12 is used to execute the target operator in the first running mode if the target operator's running mode is the first running mode, and to execute the target operator in the second running mode if the target operator's running mode is the second running mode.

[0086] In one embodiment, the detection strategy is determined based on the call stack rules of operators in different operating modes.

[0087] In one embodiment, such as Figure 8 As shown, the determination module 11 includes: a detection unit 111 and a determination unit 112, wherein,

[0088] The detection unit 111 is used to detect whether a preset global flag bit exists in the fusion distribution system;

[0089] The determining unit 112 is used to determine the operating mode of the target operator as the first operating mode if a global flag exists in the fusion distribution system, and to determine the operating mode of the target operator as the second operating mode if no global flag exists in the fusion distribution system.

[0090] In one embodiment, the preset global flag is set at a specific time based on the PyTorch mechanism. Furthermore, this specific time is determined within the kernel using the PyTorch mechanism. Specifically, the preset global flag is set when the RunningMode is set via a global variable. In detail, because the layer-by-layer mode and the fusion mode operate differently, operator calls in fusion mode are always initiated from the FusedKernel. The global flag RunningMode is set to fusion upon entering the FusedKernel and layer-by-layer upon exiting the FusedKernel, ensuring that the correct RunningMode is accessed in the underlying independent kernel. This allows a single operator kernel to support both layer-by-layer and fusion modes.

[0091] In one embodiment, such as Figure 9 As shown, the above-mentioned running module 12 includes: a compilation unit 121 and an execution unit 122, wherein,

[0092] Compilation unit 121 is used to compile the target operator to obtain the executable file of the target operator;

[0093] The execution unit 122 is used to run the executable file of the target operator on the artificial intelligence processor.

[0094] In one embodiment, the compilation unit 121 is further configured to send the target operator to the fusion unit; the execution unit 122 is further configured to compile the target operator and other operators in the fusion unit simultaneously in the fusion unit, and run the compiled executable file on the artificial intelligence processor.

[0095] Specific limitations regarding the apparatus for operator execution can be found in the limitations on the methods of operator execution described above, and will not be repeated here. Each module in the aforementioned apparatus for operator execution can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in hardware or independently of the processor in a computer device, or stored in software in the memory of a computer device, so that the processor can call and execute the operations corresponding to each module.

[0096] In one embodiment, this disclosure also provides a data processing apparatus, including a processor and a memory, the memory storing a computer program, and the processor executing the computer program to implement the steps in any of the embodiments of the above-described operator running methods.

[0097] The data processing device provided in this embodiment has a similar implementation principle and technical effect to the above-described operator operation method embodiment, and will not be described again here.

[0098] Figure 10 This is a structural diagram illustrating a combined processing apparatus 1000 according to an embodiment of this disclosure. Figure 10 As shown, the combined processing device 1000 includes a computing processing device 1002, an interface device 1004, other processing devices 1006, and a storage device 1008. Depending on the application scenario, the computing processing device may include one or more computing devices 1010, which can be configured to perform the operations described herein in conjunction with the accompanying drawings of the multi-mode operator operation.

[0099] In different embodiments, the computing processing apparatus disclosed herein can be configured to perform user-specified operations. In exemplary applications, the computing processing apparatus can be implemented as a single-core artificial intelligence processor or a multi-core artificial intelligence processor. Similarly, one or more computing devices included within the computing processing apparatus can be implemented as an artificial intelligence processor core or a portion of the hardware structure of an artificial intelligence processor core. When multiple computing devices are implemented as artificial intelligence processor cores or portions of the hardware structure of artificial intelligence processor cores, the computing processing apparatus disclosed herein can be considered to have a single-core structure or a homogeneous multi-core structure.

[0100] In exemplary operation, the computing processing device disclosed herein can interact with other processing devices through an interface device to jointly complete user-specified operations. Depending on the implementation, the other processing devices disclosed herein may include one or more types of processors such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and an artificial intelligence processor, both general-purpose and / or special-purpose processors. These processors may include, but are not limited to, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and their number can be determined according to actual needs. As mentioned above, the computing processing device disclosed herein can be considered to have a single-core structure or a homogeneous multi-core structure. However, when the computing processing device and other processing devices are considered together, they can be considered to form a heterogeneous multi-core structure.

[0101] In one or more embodiments, the other processing device may serve as an interface between the computing processing device disclosed herein (which may be specifically embodied in artificial intelligence, such as neural network operations) and external data and control, performing basic controls including but not limited to data transfer, starting and / or stopping the computing device. In another embodiment, the other processing device may also cooperate with the computing processing device to jointly complete computational tasks.

[0102] In one or more embodiments, the interface device can be used to transfer data and control commands between a computing processing device and other processing devices. For example, the computing processing device can obtain input data from other processing devices via the interface device and write it to on-chip storage (or memory) of the computing processing device. Further, the computing processing device can obtain control commands from other processing devices via the interface device and write them to on-chip control cache of the computing processing device. Alternatively or optionally, the interface device can also read data from the storage device of the computing processing device and transmit it to other processing devices.

[0103] Additionally or optionally, the combined processing apparatus disclosed herein may further include a storage device. As shown in the figures, the storage device is connected to both the computing processing device and the other processing device. In one or more embodiments, the storage device may be used to store data from the computing processing device and / or the other processing device. For example, the data may be data that cannot be fully stored in the internal or on-chip storage of the computing processing device or other processing device.

[0104] In some embodiments, this disclosure also discloses a chip (e.g. Figure 11 The chip shown is 1102. In one implementation, the chip is a system-on-a-chip (SoC) and integrates one or more such... Figure 10 The combined processing unit shown is illustrated. This chip can be connected to external interface devices (such as...). Figure 11 The external interface device 1106 shown is connected to other related components. These related components may be, for example, a camera, monitor, mouse, keyboard, network card, or Wi-Fi interface. In some applications, the chip may integrate other processing units (e.g., video codecs) and / or interface modules (e.g., DRAM interfaces). In some embodiments, this disclosure also discloses a chip package structure that includes the aforementioned chip. In some embodiments, this disclosure also discloses a board that includes the aforementioned chip package structure. The following will be combined with… Figure 11 This board is described in detail.

[0105] Figure 11 This is a schematic diagram illustrating the structure of a board 1100 according to an embodiment of this disclosure. For example... Figure 11 As shown, the board includes a storage device 1104 for storing data, which includes one or more storage cells 1110. This storage device can be connected and transmit data with the controller 1108 and the aforementioned chip 1102 via, for example, a bus. Furthermore, the board also includes an external interface device 1106, configured for data relay or switching between the chip (or a chip in a chip package) and an external device 1112 (e.g., a server or computer). For example, data to be processed can be transferred from the external device to the chip via the external interface device. Alternatively, the calculation results of the chip can be transmitted back to the external device via the external interface device. Depending on the application scenario, the external interface device can have different interface forms; for example, it can adopt a standard PCIe interface.

[0106] In one or more embodiments, the controller in the disclosed board can be configured to regulate the state of the chip. Therefore, in one application scenario, the controller may include a microcontroller (MCU) for regulating the operating state of the chip.

[0107] Based on the above combination Figure 10 and Figure 11 Based on the description, those skilled in the art will understand that this disclosure also discloses an electronic device or apparatus that may include one or more of the aforementioned boards, one or more of the aforementioned chips, and / or one or more of the aforementioned combined processing apparatus.

[0108] Depending on the application scenario, the electronic devices or apparatus disclosed herein may include servers, cloud servers, server clusters, data processing devices, robots, computers, printers, scanners, tablets, smart terminals, PC devices, IoT terminals, mobile terminals, mobile phones, dashcams, navigators, sensors, cameras, video cameras, projectors, watches, headphones, mobile storage, wearable devices, visual terminals, autonomous driving terminals, vehicles, home appliances, and / or medical devices. The vehicles include airplanes, ships, and / or vehicles; the home appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, lights, gas stoves, and range hoods; the medical devices include MRI scanners, ultrasound machines, and / or electrocardiographs. The electronic devices or apparatus disclosed herein can also be applied in fields such as the Internet, IoT, data centers, energy, transportation, public management, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, and healthcare. Furthermore, the electronic devices or apparatus disclosed herein can also be used in application scenarios related to artificial intelligence, big data, and / or cloud computing, such as cloud computing, edge computing, and terminal applications. In one or more embodiments, the high-computing-power electronic devices or apparatuses according to the present disclosure can be applied to cloud devices (e.g., cloud servers), while the low-power electronic devices or apparatuses can be applied to terminal devices and / or edge devices (e.g., smartphones or cameras). In one or more embodiments, the hardware information of the cloud devices and the hardware information of the terminal devices and / or edge devices are compatible with each other, so that suitable hardware resources can be matched from the hardware resources of the cloud devices to simulate the hardware resources of the terminal devices and / or edge devices based on the hardware information of the terminal devices and / or edge devices, so as to complete the unified management, scheduling and collaborative work of the end-to-cloud or cloud-edge-end integration.

[0109] It should be noted that, for the sake of brevity, this disclosure describes some methods and their embodiments as a series of actions and combinations thereof. However, those skilled in the art will understand that the solutions disclosed herein are not limited by the order of the described actions. Therefore, based on the disclosure or teachings of this document, those skilled in the art will understand that some steps can be performed in a different order or simultaneously. Furthermore, those skilled in the art will understand that the embodiments described in this disclosure can be considered optional embodiments, that is, the actions or modules involved are not necessarily essential for the implementation of one or more solutions disclosed herein. In addition, depending on the solution, the description of some embodiments in this disclosure may have different emphases. In view of this, those skilled in the art will understand that parts not described in detail in a certain embodiment of this disclosure can also be referred to the relevant descriptions of other embodiments.

[0110] In terms of specific implementation, based on the disclosure and teachings of this document, those skilled in the art will understand that several embodiments disclosed herein can also be implemented in other ways not disclosed herein. For example, regarding the various units in the electronic device or apparatus embodiments described above, this document divides them based on logical functions, but in actual implementation, there may be other division methods. As another example, multiple units or components can be combined or integrated into another system, or some features or functions in a unit or component can be selectively disabled. Regarding the connection relationships between different units or components, the connections discussed above in conjunction with the accompanying drawings can be direct or indirect couplings between units or components. In some scenarios, the aforementioned direct or indirect couplings involve communication connections utilizing interfaces, where the communication interface can support electrical, optical, acoustic, magnetic, or other forms of signal transmission.

[0111] In this disclosure, the units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units. The aforementioned components or units may be located in the same location or distributed across multiple network units. Furthermore, depending on actual needs, some or all of the units can be selected to achieve the purpose of the solution described in the embodiments of this disclosure. Additionally, in some scenarios, multiple units in the embodiments of this disclosure may be integrated into one unit or each unit may exist physically independently.

[0112] In some implementation scenarios, the integrated unit described above can be implemented as a software program module. If implemented as a software program module and sold or used as an independent product, the integrated unit can be stored in a computer-readable storage device (CMSDD). Therefore, when the disclosed solution is embodied in a software product (e.g., a computer-readable storage medium), the software product can be stored in a memory, which may include several instructions to cause a computer device (e.g., a personal computer, server, or network device) to execute some or all of the steps of the method described in the embodiments of this disclosure. The aforementioned memory may include, but is not limited to, various media capable of storing program code, such as USB flash drives, flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0113] In other implementation scenarios, the integrated units described above can also be implemented in hardware, i.e., as specific hardware circuits, which may include digital circuits and / or analog circuits. The physical implementation of the circuit's hardware structure may include, but is not limited to, physical devices, which may include, but are not limited to, transistors or memristors. Therefore, the various devices described herein (e.g., computing devices or other processing devices) can be implemented using appropriate hardware processors, such as CPUs, GPUs, FPGAs, DSPs, and ASICs. Furthermore, the aforementioned storage units or storage devices can be any suitable storage medium (including magnetic storage media or magneto-optical storage media), such as resistive random access memory (RRAM), dynamic random access memory (DRAM), static random access memory (SRAM), enhanced dynamic random access memory (EDRAM), high-bandwidth memory (HBM), hybrid memory cube (HMC), ROM, and RAM.

[0114] While numerous embodiments of this disclosure have been shown and described herein, it will be apparent to those skilled in the art that such embodiments are provided by way of example only. Many modifications, alterations, and alternatives will occur to those skilled in the art without departing from the spirit and intent of this disclosure. It should be understood that various alternatives to the embodiments of this disclosure described herein may be employed in the practice of this disclosure. The appended claims are intended to define the scope of this disclosure and therefore cover equivalents or alternatives within the scope of these claims.

[0115] In the above embodiments, the descriptions of each embodiment have their own emphasis. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments. The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as the combination of these technical features does not contradict each other, it should be considered within the scope of this specification.

[0116] The foregoing can be better understood in accordance with the following terms:

[0117] Clause A1, a method of operating an operator, the method comprising:

[0118] Based on the instructions, the target operator is generated in the deep learning framework;

[0119] Based on a preset detection strategy, the operating mode of the target operator is determined;

[0120] If the target operator operates in a first operating mode, the target operator is executed in the first operating mode; if the target operator operates in a second operating mode, the target operator is executed in the second operating mode.

[0121] Clause A2, pursuant to the method described in Clause A1, wherein the detection strategy is determined based on the call stack rules of operators for different operating modes.

[0122] Clause A3, according to the method described in Clauses A1 or A2, when the first operating mode is a fusion operating mode and the second operating mode is a layer-by-layer operating mode, determining the operating mode of the target operator based on a preset detection strategy includes:

[0123] Detect whether a preset global flag exists in the fusion distribution system;

[0124] If the global flag bit exists in the fusion distribution system, the operating mode of the target operator is determined to be the first operating mode; if the global flag bit does not exist in the fusion distribution system, the operating mode of the target operator is determined to be the second operating mode.

[0125] Clause A4, pursuant to the method described in Clause A3, wherein the preset global flag is set at a specific time based on the PyTorch mechanism.

[0126] Clause A5, according to the method described in Clause A4, the execution of the target operator in the first operating mode includes:

[0127] The target operator is compiled to obtain an executable file of the target operator;

[0128] The executable file of the target operator is run on an artificial intelligence processor.

[0129] Clause A6, according to the method described in Clause A4, the execution of the target operator in the second operating mode includes:

[0130] The target operator is sent to the fusion unit;

[0131] In the fusion unit, the target operator and other operators in the fusion unit are compiled simultaneously, and the compiled executable file is run on the artificial intelligence processor.

[0132] Clause A7, an apparatus for operating an operator, the apparatus comprising:

[0133] The generation module is used to generate target operators in the deep learning framework according to instructions;

[0134] The determination module is used to determine the operating mode of the target operator based on a preset detection strategy;

[0135] The execution module is configured to execute the target operator in the first execution mode if the execution mode of the target operator is the first execution mode, and execute the target operator in the second execution mode if the execution mode of the target operator is the second execution mode.

[0136] Clause A8. The apparatus described in Clause A7, wherein the detection strategy is determined based on the call stack rules of operators for different operating modes.

[0137] Clause A9. The apparatus according to Clause A7 or A8, wherein the determining module comprises: a detection unit and a determining unit, wherein,

[0138] The detection unit is used to detect whether a preset global flag exists in the fusion distribution system;

[0139] The determining unit is configured to determine the target operator's operating mode as the first operating mode if the global flag bit exists in the fusion distribution system; and to determine the target operator's operating mode as the second operating mode if the global flag bit does not exist in the fusion distribution system.

[0140] Clause A10, the apparatus described in Clause A7 or A8, wherein the preset global flag is set at a specific time based on the PyTorch mechanism.

[0141] Clause A11. The apparatus according to Clause A10, wherein the running module comprises: a compilation unit and an execution unit; the compilation unit is configured to compile the target operator to obtain an executable file of the target operator; the execution unit is configured to run the executable file of the target operator on an artificial intelligence processor.

[0142] Clause A12, the apparatus according to Clause A10, wherein the compilation unit is further configured to send the target operator to the fusion unit; and the execution unit is further configured to compile the target operator and other operators in the fusion unit simultaneously in the fusion unit, and run the compiled executable file on the artificial intelligence processor.

[0143] Clause A13. A data processing apparatus, comprising a memory and a processor, the memory storing a computer program, the processor executing the computer program to implement the steps of any one of Clauses A1 to A6.

[0144] Clause A14. A combined processing apparatus comprising a data processing apparatus as described in Clause A13, a general interconnect interface, and other processing apparatuses besides the data processing apparatus; the data processing apparatus interacts with the other processing apparatuses.

[0145] Clause A15, a chip comprising a combined processing apparatus as described in Clause A14.

[0146] Clause A16, a board card comprising the chip as described in Clause A15.

[0147] Clause A17. An electronic device comprising a board as described in Clause A16.

[0148] Clause A18. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of any one of Clauses A1 to A6.

[0149] Clause A19. A storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method described in any one of Clauses A1 to A6.

[0150] The embodiments of this disclosure have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this disclosure. The descriptions of the embodiments above are only for the purpose of helping to understand the methods and core ideas of this disclosure. Furthermore, any changes or modifications made by those skilled in the art based on the ideas of this disclosure, and on the specific implementation methods and application scope of this disclosure, are all within the scope of protection of this disclosure. Therefore, the content of this specification should not be construed as a limitation of this disclosure.

Claims

1. A method for operating an operator, characterized in that, The method includes: Based on the instructions, the target operator is generated in the deep learning framework; Based on a preset detection strategy, the operating mode of the target operator is determined; If the target operator's operating mode is a first operating mode, the target operator is executed in the first operating mode; if the target operator's operating mode is a second operating mode, the target operator is executed in the second operating mode. Wherein, when the first operating mode is a fusion operating mode and the second operating mode is a layer-by-layer operating mode, determining the operating mode of the target operator based on a preset detection strategy includes: The system detects whether a preset global flag exists in the fusion distribution system. The global flag is selected based on the PyTorch mechanism to be set to fusion when entering the fusion distribution system and to layer-by-layer when the fusion distribution system ends. If the global flag bit exists in the fusion distribution system, the operating mode of the target operator is determined to be the first operating mode; if the global flag bit does not exist in the fusion distribution system, the operating mode of the target operator is determined to be the second operating mode.

2. The method according to claim 1, characterized in that, The detection strategy is determined based on the call stack rules of operators in different operating modes.

3. The method according to claim 1 or 2, characterized in that, Executing the target operator in the second operating mode includes: The target operator is compiled to obtain an executable file of the target operator; The executable file of the target operator is run on an artificial intelligence processor.

4. The method according to claim 1 or 2, characterized in that, Executing the target operator in the first operating mode includes: The target operator is sent to the fusion unit; In the fusion unit, the target operator and other operators in the fusion unit are compiled simultaneously, and the compiled executable file is run on the artificial intelligence processor.

5. An apparatus for operating an operator, characterized in that, The device includes: The generation module is used to generate target operators in the deep learning framework according to instructions; The determination module is used to determine the operating mode of the target operator based on a preset detection strategy; The execution module is configured to execute the target operator in the first execution mode if the execution mode of the target operator is a first execution mode, and execute the target operator in the second execution mode if the execution mode of the target operator is a second execution mode. The determining module is further configured to detect whether a preset global flag exists in the fusion distribution system when the first operating mode is a fusion operating mode and the second operating mode is a layer-by-layer operating mode; if the global flag exists in the fusion distribution system, determine that the operating mode of the target operator is the first operating mode; if the global flag does not exist in the fusion distribution system, determine that the operating mode of the target operator is the second operating mode; the global flag is set to fusion when entering the fusion distribution system based on the PyTorch mechanism, and set to layer-by-layer when the fusion distribution system ends.

6. The apparatus according to claim 5, characterized in that, The detection strategy is determined based on the call stack rules of operators in different operating modes.

7. The apparatus according to claim 5, characterized in that, The operating module includes: A compilation unit is used to compile the target operator to obtain an executable file of the target operator; An execution unit is used to run the executable file of the target operator on an artificial intelligence processor.

8. The apparatus according to claim 7, characterized in that, The compilation unit is also used to send the target operator to the fusion unit; The execution unit is further configured to compile the target operator and other operators in the fusion unit simultaneously, and run the compiled executable file on the artificial intelligence processor.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 4.

10. A storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 4.