Methods, computer programs, and apparatus (model-independent input transformation for neural networks)
By learning the input transformation function to minimize task loss and post-activation density loss, the problem of high energy consumption in the neural network inference stage is solved, achieving energy reduction without affecting accuracy and supporting adaptive energy consumption adjustment for different hardware architectures.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- INTERNATIONAL BUSINESS MACHINE CORPORATION
- Filing Date
- 2022-12-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing neural networks consume a lot of energy during the inference phase, and traditional methods struggle to dynamically adjust energy consumption to meet demands. Furthermore, model compression or low-precision hardware can affect accuracy.
A model-independent input transformation technique is adopted. The first machine learning system learns the input transformation function to minimize the task loss and post-activation density loss, transforms the input data to reduce the energy consumption of the inference task, and uses a second machine learning system for the inference task.
It achieves reduced power consumption during the inference stage without affecting accuracy, and supports adaptive power consumption adjustment under different hardware architectures, providing a flexible trade-off between power consumption and accuracy.
Smart Images

Figure 0007876433000003 
Figure 0007876433000004 
Figure 0007876433000005
Abstract
Description
[Technical Field]
[0001] This invention relates to electrical, electronic, and computer technologies, and more specifically, to improved machine learning systems. [Background technology]
[0002] Training and using artificial intelligence (AI) models can consume a considerable amount of energy. Various techniques have been developed to mitigate the negative consequences of this effect. Conventional methods for reducing the carbon footprint of AI systems often focus on reducing energy consumption during training rather than during inference (i.e., current techniques do not consider reducing the energy consumption of the inference application programming interface (API), and it is difficult for current inference APIs to dynamically adjust energy consumption levels in response to demand). Other techniques for reducing energy consumption consider model compression or low-precision hardware to constrain the accuracy of the machine. [Overview of the project] [Problems that the invention aims to solve]
[0003] Conventional neural networks directly input data into the input layer, which can lead to a relatively large amount of energy consumption in processing the input data 104. [Means for solving the problem]
[0004] The principle of the present invention provides a model-independent input transformation technique for neural networks. In one embodiment, an exemplary method includes the steps of: using a first machine learning system to learn an input transformation function for transforming input data for a second machine learning system, wherein the learning is based on minimizing the sum of task loss and post-activation density loss; transforming the input data using the learned input transformation function to modify the post-activation density to reduce the amount of energy consumed in the inference task; and using the second machine learning system to perform the inference task on the transformed input data.
[0005] In one embodiment, a non-temporary computer-readable medium includes a computer-executable instruction, which, when executed by a computer, causes the computer to perform a method comprising: learning an input transformation function for transforming input data for a second machine learning system using a first machine learning system, wherein the learning step is based on minimizing the sum of task loss and post-activation density loss; transforming the input data using the learned input transformation function to modify the post-activation density to reduce the amount of energy consumed in the inference task; and performing the inference task on the transformed input data using the second machine learning system.
[0006] In one embodiment, the apparatus comprises a memory and at least one processor coupled to the memory, wherein the at least one processor is operable to perform operations including learning an input transformation function for transforming input data for a second machine learning system, wherein learning is based on minimizing the sum of task loss and post-activation density loss; transforming the input data using the learned input transformation function to modify the post-activation density to reduce the amount of energy consumed by the inference task; and performing the inference task on the transformed input data using the second machine learning system.
[0007] As used herein, “facilitating” an action includes performing an action, making an action easier, assisting in the performance of an action, or causing an action to be performed. Therefore, as an example, and not an limitation, an instruction running on one processor may facilitate an action performed by an instruction running on a remote processor by sending appropriate data or commands to cause or assist in the performance of the action. To avoid doubt, even if an agent facilitates an action by means other than performing the action, the action is nevertheless performed by some entity or combination of entities.
[0008] One or more embodiments or elements of the present invention may be implemented in the form of a computer program product including a computer-readable storage medium with computer-usable program code for performing the method steps shown. Furthermore, one or more embodiments or elements of the present invention may be implemented in the form of a system (or device) including memory and at least one processor coupled to the memory and operable to perform the exemplary method steps. Notably, in another aspect, one or more embodiments or elements of the present invention may be implemented in the form of means for performing one or more of the method steps described herein, the means may include (i) a hardware module, (ii) a software module stored in a computer-readable storage medium (or more such media) and implemented on a hardware processor, or (iii) a combination of (i) and (ii), any of (i) to (iii) implements the particular technique described herein.
[0009] The techniques of the present invention can provide highly beneficial technical effects. For example, one or more embodiments provide one or more of the following:
[0010] Energy Reduction during Inference
[0011] Energy - saving Techniques Independent of Model Compression and Accuracy Degradation
[0012] Neural Energy - saving Functions That Can Be Configured for Different Hardware Architectures
[0013] Customizable Energy Criteria for Underlying Sparsity - compliant Hardware, Such as Activation Density, and Other Energy - related Criteria Suitable for Learning a Corresponding Neural Energy Saver to Reduce Energy Cost
[0014] Dynamic Adjustment (On - the - fly) of Energy - accuracy Trade - off
[0015] Some embodiments may not have these potential advantages, and these potential advantages are not necessarily required in all embodiments. These and other objects, features and advantages of the present invention should become apparent from the following detailed description of exemplary embodiments of the present invention, which is to be read in conjunction with the accompanying drawings.
Brief Description of the Drawings
[0016] [Figure 1] It is a top - level block diagram of an energy - saving neural network system according to an exemplary embodiment.
[0017] [Figure 2] It is an overview of an exemplary workflow for obtaining energy savings in a neural network that utilizes a conversion module according to an exemplary embodiment.
[0018] [Figure 3] It is a table showing the trade - off between accuracy and activation density for an exemplary configuration according to an exemplary embodiment.
[0019] [Figure 4] A table showing the effectiveness of energy saving using different conventional models according to an exemplary embodiment.
[0020] [Figure 5] An overview of a hardware simulation infrastructure for energy per instruction measurement value according to an exemplary embodiment.
[0021] [Figure 6] A table showing energy reduction through the use of sparsity in a neural energy saver model according to an exemplary embodiment.
[0022] [Figure 7] A diagram showing a cloud computing environment according to an embodiment of the present invention.
[0023] [Figure 8] An abstraction model layer according to an embodiment of the present invention is shown.
[0024] [Figure 9] A computer system that may be useful in the implementation of one or more aspects of the present invention, or elements thereof, or both, and that represents a cloud computing node according to an embodiment of the present invention.
Best Mode for Carrying Out the Invention
[0025] Figure 1 is a top-level block diagram of an energy-saving neural network system 100 according to an exemplary embodiment. The energy-saving neural network system 100 includes a neural network 112 and an energy-saving conversion module 108. As shown in Figure 1, the neural network 112 includes an input layer, several hidden layers, and an output layer. Conventional neural networks input data directly into the input layer, which can result in relatively high energy consumption in processing the input data 104. In one exemplary embodiment, the data 104 is converted by the energy-saving conversion module 108 before being processed by the trained neural network 112, which results in considerable energy savings during the inference phase.
[0026] Figure 2 outlines an exemplary workflow 200 for achieving energy savings in a neural network 112 utilizing a transformation module 108, according to an exemplary embodiment. A pre-trained neural network 112 and training data are acquired (operation 204). Note that the energy-saving neural network system 100 does not depend on different pre-trained neural network models (characterized by normalized linear (ReLU) activations with a minimum activation value of zero), model compression, and prediction degrading. An energy-saving input transformation function is learned in the neural network 112 to reduce energy consumption (operation 208). Multiple neural energy saver transformation functions may be learned to enable different levels of energy saving implementation. The neural energy saver is applied to the unfolded system 100 by transforming the input data 104 using one of the learned transformation functions before processing by the neural network 112 (operation 212). Inference power consumption is reduced, and an on-demand trade-off between energy and performance (via the selection of a specific transformation function) is provided.
[0027] Learning Neural Energy Saving Functions
[0028] In one exemplary embodiment, the ReLU post-activation density is used as the energy criterion because denser levels of activation generally consume more energy. Given a data input x, the number of non-zero activation values is Σ z∈すべての活性化後の値 Defined as 1[z(x)≠0], where 1 is the binary representation function. Energy savings during the inference phase can be achieved by transforming the input data 104 so that the number of non-zero activation values is reduced.
[0029] In one exemplary embodiment, given a pre-trained model, the parameterized input transformation function g of the transformation module 108 θ (Data input) is trained (using training data), and the objective is: Minimize θ loss タスク (θ)+λ·loss 密度 (θ) Here, θ represents the parameter of the input transformation function, and the first loss (Loss task ) ensures good accuracy of the task, while the second loss (Loss density ) ensures that the density after activation is small enough to obtain energy savings. λ acts to balance the two losses, thereby allowing a trade-off between accuracy and energy consumption. In one exemplary embodiment, the input transformation function is trained using the same training samples used to train the neural network 112. The transformation module 108 may be implemented, for example, by higher-level code that encodes the logic disclosed herein and is then compiled or interpreted into computer executable instructions.
[0030] In one exemplary embodiment, a plurality of θ (transformation) functions are trained with different values of λ. The user can then select a value of λ (or another parameter corresponding to λ, as more fully described below), and the corresponding transformation function is utilized to transform the input data 104 prior to processing by the neural network 112. Thus, an on-demand adjustment of energy savings versus accuracy (i.e., a neural energy saver having different levels of energy savings and accuracy) can be obtained.
[0031] In one exemplary embodiment, g θ(x) An addition function is used by setting g = x + θ, where θ is common for all data inputs. After transformation, the input data to the neural network 112 will be based on g θ(x) = x + θ. In one exemplary embodiment, the loss is the loss タスク (θ) as the task-specific loss and the loss density (θ) = Σ z∈すべての活性化後の値 is defined as tanh[z(x + θ)], where tanh is used to approximate the 0-1 loss of the activated density, and λ controls the energy-accuracy trade-off. More complex g θ(x) functions, e.g., non-linear functions, are also contemplated.
[0032] In one exemplary embodiment, to activate the energy-saving mode, the user simply enables or disables the energy-saving mode (when the energy-saving mode is disabled, no transformation is utilized, i.e., λ is set to zero, and when the energy-saving mode is enabled, λ is set to a value selected by the user or a predetermined value).
[0033] In one exemplary embodiment, energy measurements are obtained for each of several ranges of activation densities for a given hardware platform and a given neural network model. A table is created associating the energy measurements or energy savings (e.g., energy savings rate) or both with precision levels, values of λ, a transformation function, or any combination thereof. This table can be used to select a value of λ based on the selected precision level, energy consumption level, or energy savings level defined in the table. In one exemplary embodiment, the user may specify a precision level, density value, energy level, or energy savings rate, and an appropriate transformation function is selected based on the corresponding value of λ.
[0034] Performance evaluation
[0035] Exemplary embodiments of three pre-trained model architectures were evaluated for a conventional classification task (a dataset containing over 50,000 color images corresponding to multiple classes, each containing over 5,000 images), and sparsity-accommodating hardware evaluations were simulated for energy evaluation. Figure 3 is a table showing the trade-offs between accuracy and activation density for exemplary configurations by exemplary embodiments. As demonstrated by the table in Figure 3, increasing λ and the corresponding energy savings results in a decrease in overall accuracy and a decrease in the average density of the test set (where the density of data points = non-zero activations / total activations, and the average density of the test set = the average density of all test set data samples).
[0036] Effectiveness across different networks
[0037] Figure 4 is a table showing the energy-saving effectiveness using different conventional models according to an exemplary embodiment. In the example in Figure 4, λ = 5e -2 As shown in Figure 4, each model has a corresponding baseline accuracy, baseline density, accuracy (after neural energy saving), and density (after neural energy saving).
[0038] In one exemplary embodiment, energy reduction is achieved through the utilization of sparsity in a hardware accelerator. Dynamic energy reduction during convolutional layer computation is estimated using the disclosed neural energy saver model compared to a baseline model. The accelerator type considered is cross product-based matrix multiplication (e.g., an inline matrix multiplication accelerator (MMA) on a commercially available mathematical hardware accelerator on a modern high-end processor, e.g., an IBM POWER 10 processor (POWER® is a registered trademark of International Business Machines Corporation (IBM) in Armonk, New York, USA)), where the convolution operation is implemented as vector-vector cross product matrix multiplication using fine-grained instructions in the MMA, with each instruction performing 16 multiplication operations. Computational energy reduction was possible through two architectural features added to utilize the sparsity of activation.
[0039] A) Reduce dynamic energy by skipping instructions: Use conditional execution to skip instructions when the operand (4-element activation vector) is zero.
[0040] B) Reduce dynamic energy by skipping calculations according to instructions: Skip specific operations in an instruction using an element-wise mask corresponding to zero-value elements in a four-element activation vector.
[0041] Methodology for energy estimation
[0042] For both features A and B, the model's convolutional layers were mapped to MMA instructions, as described in the IBM document "A matrix math facility for Power ISA (TM) processors" by Moreira, Jose E et al., arXiv preprint arXiv:2104.03142 (2021), to accelerate generalized matrix multiplication (GEMM) and, consequently, convolution in deep neural network (DNN) inference tasks. The MMA unit performs a rank-k update operation in which the cross product of two matrices is accumulated in the output matrix. The MMA instruction uses a 128-bit vector scalar register for input and another 512-bit accumulator register for output, enabling 4x4 32-bit matrix multiplication operations. Through a series of cross product-based matrix multiplication operations, two-dimensional and three-dimensional convolutions are performed between the kernel and the image.
[0043] We obtained the sparsity of the operands of each instruction in each convolutional layer, as well as the number of such instructions. The total dynamic energy for computation in the convolutional layers (between instructions), which is workload-dependent, is defined as follows:
number
[0044] The total dynamic energy for calculations in the convolutional layer (within an instruction) is defined as follows:
number
[0045] Figure 5 shows an overview of the hardware simulation infrastructure for energy per instruction measurement in an exemplary embodiment. A processor 504 with a core 508 containing an inline matrix multiplication accelerator was used to perform two assessments: a) executing a single matrix multiplication (known general matrix multiplication technique) instruction that performs the cross product of two m × m matrices initialized to random values (where m=4 in the exemplary embodiment utilizing core 508), and b) executing general matrix multiplication programs of dimensions M × N and N × K, each initialized to a different sparsity level (M, N, K >> m). The environment was replicated to create a full system simulation environment 512, in which register transfer level (RTL) simulation was made possible by generating register transfer level (RTL) executable test cases. Using the RTL simulation, the energy per general matrix multiplication instruction, as well as the calculation of different sparsity levels (Y) of A and B, were performed. Mx =A MxK ×B KxN +C MxN We performed power and energy estimations for each component (MMA), including energy per unit.
[0046] Figure 6 is a table illustrating energy savings through the use of sparsity in a neural energy saver model, according to an exemplary embodiment. Although the sample evaluation embodiment is an cross product-based MMA, the neural energy saver model would result in energy savings in other types of accelerator architectures, such as commercially available systolic array-based spatial convolution accelerators (a non-limiting example being IBM's RaPID accelerator with sparsity support).
[0047] In light of the above, it will be understood that, in general terms, an exemplary method according to one aspect of the present invention is an operation comprising the steps of using a first machine learning system to learn an input transformation function 108 for transforming input data 104 for a second machine learning system, wherein the learning step includes the steps of: a learning step based on minimizing the sum of task loss and post-activation density loss; a step of using the learned input transformation function 108 to transform the input data 104 and modify the post-activation density to reduce the amount of energy consumed in the inference task; and a step of using the second machine learning system to perform the inference task on the transformed input data.
[0048] In one exemplary embodiment, the second machine learning system is implemented in a neural network 112. In one exemplary embodiment, the learning step of the input transformation function 108 includes balancing the task-specific loss at inference time with the post-activation density of the neural network 112 based on a specified trade-off factor. In one exemplary embodiment, the learning operation is repeated with different specified trade-off factors to generate additional input transformation functions 108, the transformation operation is performed using a selected input transformation function 108 corresponding to a selected trade-off factor among the specified trade-off factors.
[0049] In one exemplary embodiment, one of the input transformation functions 108 is selected. In one exemplary embodiment, the selected selection is one of the selected tradeoff factors, energy level, energy saving level, identification information for one of the input transformation functions 108, and accuracy level. In one exemplary embodiment, the task loss ensures good accuracy of the inference task, while the post-activation density loss ensures that the post-activation density becomes small enough to obtain reduced energy consumption, and each specified tradeoff factor functions to balance the task loss and the post-activation density loss to enable the tradeoff between accuracy and energy consumption.
[0050] In one embodiment, a non-temporary computer-readable medium includes a computer-executable instruction, which, when executed by a computer, causes the computer to perform a method comprising: learning an input transformation function 108 for transforming input data 104 for a second machine learning system using a first machine learning system, wherein the learning step is based on minimizing the sum of task loss and post-activation density loss; transforming the input data 104 using the learned input transformation function 108 to modify the post-activation density to reduce the amount of energy consumed in the inference task; and performing the inference task on the transformed input data using the second machine learning system.
[0051] In one embodiment, the apparatus comprises a memory and at least one processor coupled to the memory, wherein the at least one processor is operable to perform operations including learning an input transformation function 108 for transforming input data 104 for a second machine learning system using a first machine learning system, wherein learning is based on minimizing the sum of task loss and post-activation density loss; transforming the input data 104 using the learned input transformation function 108 to modify the post-activation density to reduce the amount of energy consumed in the inference task; and performing the inference task on the transformed input data using the second machine learning system.
[0052] While this disclosure includes a detailed description of cloud computing, it should be understood that the implementation of the teachings described herein is not limited to cloud computing environments. Rather, embodiments of the present invention can be implemented in conjunction with any other type of computing environment that is currently known or may be developed in the future.
[0053] Cloud computing is a service delivery model that enables convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and deployed with minimal management effort or interaction with service providers. This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
[0054] The characteristics are as follows:
[0055] On-demand self-service: Cloud consumers can unilaterally provision computing power, such as server time and network storage, automatically as needed, without requiring human interaction with service providers.
[0056] Broad network access: Capabilities are available over the network and accessed through standard mechanisms that facilitate use by heterogeneous thin-client or thick-client platforms (e.g., mobile phones, laptops, and PDAs).
[0057] Resource pooling: A provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with various physical and virtual resources dynamically allocated and reallocated according to demand. Consumers generally do not have control over, or knowledge of, the exact location of the resources provided, although they may be able to specify the location at a higher level of abstraction (e.g., country, state, or data center), thus exhibiting location independence.
[0058] Rapid scalability: A wide range of capabilities can be provisioned quickly and flexibly, sometimes automatically, allowing for instant scaling out or rapid release and instant scaling in. For consumers, this often feels like an unlimited amount of capacity is available for provisioning, and it can be purchased anytime, in any quantity.
[0059] Measurable Services: Cloud systems automatically control and optimize resource usage by leveraging metric capabilities appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts) at a certain level of abstraction. Resource usage is monitored, controlled, and reported, providing transparency to both service providers and consumers.
[0060] The service model is as follows:
[0061] Software as a Service (SaaS): The capability offered to consumers is the use of a provider's applications running on cloud infrastructure. These applications are accessible from various client devices through thin client interfaces such as web browsers (e.g., web-based email). Consumers do not manage or control the underlying cloud infrastructure, including networks, servers, operating systems, storage, or even individual application capabilities. However, limited user-specific application configuration settings may be an exception.
[0062] Platform as a Service (PaaS): The capability offered to consumers is the ability to deploy applications they have created or acquired, written using programming languages and tools supported by the provider, onto a cloud infrastructure. Consumers do not manage or control the underlying cloud infrastructure, including networks, servers, operating systems, or storage, but they do control the deployed applications and, in some cases, the configuration of the application hosting environment.
[0063] Infrastructure as a Service (IaaS): The ability provided to consumers is the provisioning of processing, storage, networking, and other fundamental computing resources, which consumers can then deploy and run any software, including operating systems and applications. Consumers do not manage or control the underlying cloud infrastructure, but they do control the operating system, storage, and deployed applications, and, in some cases, have limited control over selected networking components (e.g., host firewalls).
[0064] The deployment model is as follows:
[0065] Private Cloud: Cloud infrastructure operated exclusively for one organization. This can be managed by the organization or a third party and can reside on-premises or off-premises.
[0066] Community Cloud: A cloud infrastructure shared by several organizations to support a specific community with common interests (e.g., mission, security requirements, policies, and compliance considerations). It can be managed by an organization or a third party and can reside on-premises or off-premises.
[0067] Public cloud: Cloud infrastructure is made available to the general public or large industry groups and is owned by organizations that sell cloud services.
[0068] Hybrid cloud: A cloud infrastructure that combines two or more clouds (private, community, or public), where each cloud remains a distinct entity, but is bound together by standardized or proprietary technologies that enable data and application portability (e.g., cloud bursting for load balancing across clouds).
[0069] Cloud computing environments are service-oriented, emphasizing statelessness, low coupling, modularity, and semantic interoperability. At the core of cloud computing lies an infrastructure that includes a network of interconnected nodes.
[0070] Referring here to Figure 7, an exemplary cloud computing environment 50 is shown. As illustrated, the cloud computing environment 50 includes one or more cloud computing nodes 10 that can communicate with local computing devices used by cloud consumers, such as a personal digital assistant (PDA) or cellular phone 54A, a desktop computer 54B, a laptop computer 54C, or an automotive computer system 54N or a combination thereof. The nodes 10 can communicate with each other. They may be physically or virtually grouped (not shown) in one or more networks, or a combination thereof, such as private, community, public, or hybrid clouds as described above. This allows the cloud computing environment 50 to provide infrastructure, platforms, or software or a combination thereof as a service, without requiring cloud consumers to maintain resources on their local computing devices. The types of computing devices 54A-N shown in Figure 7 are for illustrative purposes only, and it should be understood that the computing nodes 10 and the cloud computing environment 50 can communicate with any type of computerized device via any type of network, or a network addressable connection, or both (for example, using a web browser).
[0071] Referring now to Figure 8, a set of functional abstraction layers provided by the cloud computing environment 50 (Figure 7) is shown. It should be understood in advance that the components, layers, and functionalities shown in Figure 8 are for illustrative purposes only and that embodiments of the present invention are not limited thereto. As shown in the figure, the following layers and corresponding functionalities are provided:
[0072] The hardware and software layer 60 includes hardware components and software components. Examples of hardware components include a mainframe 61, RISC (Reduced Instruction Set Computer) architecture-based servers 62, 63, blade servers 64, storage devices 65, and network and networking components 66. In some embodiments, the software components include network application server software 67 and database software 68.
[0073] The virtualization layer 70 provides an abstraction layer that may provide examples of virtual entities, including virtual servers 71, virtual storage 72, virtual networks 73 including virtual private networks, virtual applications and operating systems 74, and virtual clients 75.
[0074] In one example, the management layer 80 may provide the following functions: Resource provisioning 81 provides the dynamic procurement of computing and other resources used to perform tasks within the cloud computing environment. Metering and pricing 82 provides cost tracking as resources are used within the cloud computing environment and billing or invoicing for the consumption of these resources. In one example, these resources may include application software licenses. Security provides identity verification for cloud consumers and tasks, and protection for data and other resources. User portal 83 provides consumers and system administrators with access to the cloud computing environment. Service level management 84 provides the allocation and management of cloud computing resources to ensure that required service levels are met. Service level agreement (SLA) planning and execution 85 pre-arranges and procures cloud computing resources that are expected to be in future demand according to the SLA.
[0075] The workload layer 90 provides examples of functions for which a cloud computing environment may be utilized. Examples of workloads and functions that may be provided from this layer include mapping and navigation 91, software development and lifecycle management 92, provision of virtual classroom education 93, data analysis processing 94, transaction processing 95, and at least some of the neural network transformers 96 (for example, one non-exclusive use case is for a local user to own a model and use a cloud environment to acquire such a neural energy server through cloud training).
[0076] One or more embodiments or elements of the present invention can be implemented in the form of a device comprising memory and at least one processor coupled to the memory and capable of operating to perform exemplary method steps. Figure 9 shows a computer system that is useful in implementing one or more embodiments or elements of the present invention, or both, and which may represent a cloud computing node according to an embodiment of the present invention. Referring here to Figure 9, the cloud computing node 10 is merely an example of a suitable cloud computing node and is not intended to present any limitations on the scope of use or functionality of the embodiments of the present invention described herein. In any case, the cloud computing node 10 can implement or perform or both any of the functionality described herein.
[0077] The cloud computing node 10 contains computer systems / servers 12 that operate with many other general-purpose or application-specific computing system environments or configurations. Examples of well-known computing systems, environments, or configurations or combinations thereof that may be suitable for use with computer systems / servers 12 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable home electronic devices, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices.
[0078] The computer system / server 12 can be described in the general context of computer system executable instructions, such as program modules, that are executed by the computer system. Generally, a program module may include routines, programs, objects, components, logic, data structures, etc., that perform a specific task or implement a specific abstract data type. The computer system / server 12 can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices linked over a communication network. In a distributed cloud computing environment, program modules can reside on both local and remote computer system storage media, including memory storage devices.
[0079] As shown in Figure 9, the computer system / server 12 in the cloud computing node 10 is shown in the form of a general-purpose computing device. The components of the computer system / server 12 may include, but are not limited to, one or more processors or processing units 16, system memory 28, and a bus 18 that connects various system components, including the system memory 28, to the processor 16.
[0080] Bus 18 represents any one or more of several types of bus structures, including memory buses or memory controllers, peripheral buses, accelerated graphics ports, and local buses that use a processor or any of various bus architectures. Examples, but not limited to, such architectures include Industry Standard Architecture (ISA) buses, Microchannel Architecture (MCA) buses, Enhanced ISA (EISA) buses, Video Electronics Standards Association (VESA) local buses, and Peripheral Component Interconnect (PCI) buses.
[0081] The computer system / server 12 typically includes various computer system-readable media. Such media can be any available media accessible by the computer system / server 12, and include both volatile and non-volatile media, as well as removable and non-removable media.
[0082] The system memory 28 may include computer system-readable media in the form of volatile memory, such as random access memory (RAM) 30 or cache memory 32 or both. The computer system / server 12 may further include other removable / non-removable volatile / non-volatile computer system storage media. For illustrative purposes only, a storage system 34 may be provided for reading from and writing to a non-removable, non-volatile magnetic medium (not shown, but commonly referred to as a “hard drive”). Not shown, a magnetic disk drive for reading to and writing to a removable non-volatile magnetic disk (e.g., a “floppy disk”) and an optical disk drive for reading to or writing to a removable non-volatile optical disk, such as a CD-ROM, DVD-ROM, or other optical medium may be provided. In such cases, each may be connected to the bus 18 by one or more data medium interfaces. As further shown and described below, the memory 28 may include at least one program product having a set of program modules (e.g., at least one) configured to perform the functions of embodiments of the present invention.
[0083] A program / utility 40 having a set of program modules 42 (at least one of the program modules 42) can be stored in memory 28, as an example but not limited to, an operating system, one or more application programs, other program modules, and program data. Each of these operating systems, one or more application programs, other program modules, and program data, or any combination thereof, may include an implementation of a network environment. The program modules 42 generally perform functions, methodologies, or both of the embodiments of the present invention as described herein.
[0084] The computer system / server 12 can also communicate with one or more external devices 14, such as a keyboard, pointing device, or display 24, which allow a user to interact with the computer system / server 12, or with any device (e.g., a network card, modem, etc.) that allows the computer system / server 12 to communicate with one or more other computing devices, or a combination thereof. Such communication can be performed via the input / output (I / O) interface 22. Furthermore, the computer system / server 12 can communicate with one or more networks, such as a local area network (LAN), a general wide area network (WAN), or a public network (e.g., the Internet), or a combination thereof, via the network adapter 20. As shown in the figure, the network adapter 20 communicates with other components of the computer system / server 12 via the bus 18. It should be understood that other hardware components, software components, or both, which are not shown, can be used in conjunction with the computer system / server 12. Examples include, but are not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archive storage systems.
[0085] Therefore, one or more embodiments can utilize software running on a general-purpose computer or workstation. Referring to Figure 9, such an implementation may employ, for example, a processor 16, memory 28, and a display 24, and an input / output interface 22 to an external device 14 such as a keyboard or pointing device. As used herein, the term “processor” is intended to include any processing device, such as a CPU (Central Processing Unit) or a device including other forms of processing circuits or both. Furthermore, the term “processor” may refer to two or more separate processors. The term “memory” is intended to include memory associated with a processor or CPU, such as RAM (Random Access Memory) 30, ROM (Read-Only Memory), a fixed memory device (e.g., a hard drive 34), a removable memory device (e.g., a diskette), and flash memory. In addition, as used herein, the expression “input / output interface” is intended to refer to an interface to, for example, one or more mechanisms for inputting data into a processing unit (e.g., a mouse) and one or more mechanisms for providing results associated with the processing unit (e.g., a printer). The processor 16, memory 28, and input / output interface 22 can be interconnected, for example, via a bus 18 as part of a data processing unit 12. Suitable interconnections via the bus 18 may also be provided for a network interface 20, such as a network card, which may be provided to interface with a computer network, and for a media interface, such as a diskette or CD-ROM drive, which may be provided to interface with a suitable medium.
[0086] Accordingly, computer software including instructions or code for performing the methodology of the present invention can be stored in one or more of the relevant memory devices (e.g., ROM, fixed or removable memory) as described herein, and when ready for use, can be loaded partially or entirely (e.g., into RAM) and implemented by the CPU. Such software may include, but is not limited to, firmware, resident software, microcode, and the like.
[0087] A data processing system suitable for storing and / or executing program code includes at least one processor 16 directly or indirectly connected to a memory element 28 via a system bus 18. The memory element may include local memory used during the actual implementation of the program code, bulk storage, and a cache memory 32 that provides temporary storage for at least some program code to reduce the number of times the code must be retrieved from bulk storage during implementation.
[0088] Input / output or I / O devices (including, but not limited to, keyboards, displays, and pointing devices) can be connected to the system either directly or through an intermediary I / O controller.
[0089] A network adapter 20 can also be connected to the system to enable the data processing system to connect to other data processing systems or remote printers or storage devices through an intermediary private or public network. Currently available types of network adapters include modems, cable modems, and Ethernet® cards.
[0090] As used herein, including in the claims, “server” includes a physical data processing system (e.g., system 12 as shown in Figure 9) on which a server program is running. It should be understood that such a physical server may or may not include a display and a keyboard.
[0091] One or more embodiments may be implemented, at least in part, in the context of a cloud or virtual machine environment, but this is illustrative and not limiting. Please refer again to Figures 7-8 and the accompanying text.
[0092] It should be noted that any of the methods described herein may include an additional step of providing a system that includes a separate software module embodied in a computer-readable storage medium, and that the module may include, for example, one, some, or all of the modules / blocks and / or submodules / subblocks described herein, not limited to, any suitable elements of, for example, a block diagram, or described herein, or both. In this case, the method step may be carried out using a separate software module or submodule or both of the above-described system that runs on one or more hardware processors such as 16. Furthermore, a computer program product may include a computer-readable storage medium having code adapted to be implemented to carry out one or more method steps described herein, including providing a system having a separate software module.
[0093] One example of a user interface that may be used in some cases is hypertext markup language (HTML) code, which is served by a server or other means to the browser on the user's computing device. HTML is parsed by the browser on the user's computing device to create a graphical user interface (GUI).
[0094] Exemplary system and product details
[0095] The present invention may be a system, method, or computer program product, or a combination thereof, integrated at any possible level of technical detail. The computer program product may include a computer-readable storage medium (or a plurality of computer-readable storage media) having computer-readable program instructions for causing a processor to perform an aspect of the present invention.
[0096] A computer-readable storage medium can be a tangible device capable of holding and storing instructions used by an instruction execution device. A computer-readable storage medium may be, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any preferred combination of the above. A non-exhaustive list of more specific examples of computer-readable storage media includes portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital multipurpose disks (DVDs), memory sticks, floppy disks, mechanically encoded devices such as punch cards or grooved raised structures on which instructions are recorded, and any preferred combination of the above. As used herein, computer-readable storage media should not be interpreted as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses passing through fiber optic cables), or electrical signals transmitted through wires.
[0097] The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to each computing / processing device, or they may be downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, or a wireless network, or a combination thereof. The network may include copper transmission cables, optical transmission fibers, wireless transmissions, routers, firewalls, switches, gateway computers, or edge servers, or a combination thereof. A network adapter card or network interface in each computing / processing device receives computer-readable program instructions from the network and transfers the computer-readable program instructions for storage in a computer-readable storage medium within each computing / processing device.
[0098] The computer-readable program instructions that perform the operation of the present invention may be assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, and may be source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk® or C++, and procedural programming languages such as the C programming language or similar programming languages. The computer-readable program instructions can run as a standalone software package, either entirely on the user's computer, partially on the user's computer, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, via the Internet using an Internet service provider). In some embodiments, for example, an electronic circuit including a programmable logic circuit, a field-programmable gate array (FPGA), or a programmable logic array (PLA) can be personalized by executing computer-readable program instructions using state information of computer-readable program instructions in order to perform an aspect of the present invention.
[0099] Aspects of the present invention are described herein with reference to flowcharts or block diagrams, or both, of methods, apparatus (systems), and computer program products according to embodiments of the present invention. It will be understood that each block in a flowchart or block diagram, or a combination thereof, and any combination of blocks in a flowchart or block diagram, or a combination thereof, can be implemented by computer-readable program instructions.
[0100] These computer-readable program instructions may be provided to the processor of a general-purpose computer, a dedicated computer, or other programmable data processing device to generate a machine, thereby creating means for instructions executed via the computer or other programmable data processing device processor to implement functions / operations specified in one or more blocks of a flowchart or block diagram, or both. These computer-readable program instructions may also be stored in a computer-readable storage medium, which can instruct a computer, a programmable data processing device, or other device or combination thereof to function in a specific manner, resulting in a computer-readable storage medium containing instructions that include instructions to implement modes of functions / operations specified in one or more blocks of a flowchart or block diagram, or combination thereof.
[0101] Furthermore, computer-readable program instructions can be loaded into a computer, other programmable data processing device, or other device to create a computer implementation process by executing a series of operational steps on the computer, other programmable device, or other device, thereby enabling the instructions executed on the computer, other programmable device, or other device to implement functions / operations specified in one or more blocks of a flowchart or block diagram, or both.
[0102] The flowcharts and block diagrams in the drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of instructions containing one or more executable instructions that implement a specified logical function. In some alternative implementations, the functions described in a block may be performed in an order different from the order shown in the drawing. For example, two blocks shown consecutively may actually be executed substantially simultaneously, and blocks may be executed in reverse order depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, or a combination thereof, and the combination of blocks in a block diagram or flowchart, or a combination thereof, may be implemented by an application-specific hardware-based system that performs a specified function or operation, or executes a combination of application-specific hardware and computer instructions.
[0103] The descriptions of various embodiments of the present invention have been presented for illustrative purposes only and are not intended to be comprehensive or limitless. Those skilled in the art will see many modifications and variations that do not depart from the scope and spirit of the described embodiments. The terminology used herein has been chosen to best describe the principles of the embodiments, practical applications, or technical improvements beyond the technology available on the market, or to enable those skilled in the art to understand the embodiments disclosed herein.
Claims
1. A step of learning an input transformation function that transforms input data for a second machine learning system using a first machine learning system, wherein the learning step is based on minimizing the sum of task loss and post-activation density loss, A step of transforming the input data using the learned input transformation function to change the post-activation density in order to reduce the amount of energy consumed in the inference task, The steps include: performing the inference task on the transformed input data using the second machine learning system; A method that includes [a certain feature].
2. The method according to claim 1, wherein the second machine learning system is implemented using a neural network.
3. The method according to claim 2, wherein the step of learning the input transformation function includes balancing the task-specific loss in inference time with the post-activation density of the neural network based on specified trade-off factors.
4. The method according to any one of claims 1 to 3, further comprising a step of repeating the learning step with different specified tradeoff factors to generate additional input transformation functions, wherein the modifying step is performed using selected input transformation functions corresponding to selected tradeoff factors from among the specified tradeoff factors.
5. The method according to claim 4, further comprising the step of obtaining the selection of the input conversion function.
6. The method according to claim 5, wherein the selection obtained is one of the selected trade-off factors, energy level, energy saving level, one of the input conversion function, and one of the accuracy level.
7. The method according to claim 4, wherein, in the learning stage, the task loss ensures good accuracy of the inference task, while the post-activation density loss ensures that the post-activation density becomes small enough to obtain reduced energy consumption, and each specified trade-off factor functions to balance the task loss and the post-activation density loss in order to enable the trade-off between accuracy and energy consumption.
8. On the computer, A procedure for learning an input transformation function for transforming input data for a second machine learning system using a first machine learning system, wherein the learning procedure is based on minimizing the sum of the task loss and the post-activation density loss. A procedure for transforming the input data using the learned input transformation function to modify the post-activation density in order to reduce the amount of energy consumed in the inference task, A computer program for performing the procedure of performing the inference task on the transformed input data using the second machine learning system.
9. The computer program according to claim 8, wherein the second machine learning system is implemented using a neural network.
10. The computer program according to claim 9, wherein the procedure for learning the input transformation function includes balancing the task-specific loss in inference time with the post-activation density of the neural network based on specified trade-off factors.
11. The computer program according to any one of claims 8 to 10, wherein the computer is further instructed to perform a procedure to generate additional input transformation functions by repeating the learning procedure with different specified tradeoff factors, the modifying procedure being performed using a selected input transformation function corresponding to a selected tradeoff factor from among the specified tradeoff factors.
12. The computer is then instructed to perform the procedure for obtaining the selection of the input conversion function. The computer program according to claim 11.
13. The computer program according to claim 12, wherein the selection obtained is one of the selected trade-off factors, energy level, energy saving level, one of the input conversion function, and one of the accuracy level.
14. Memory and A device comprising at least one processor connected to the memory, wherein the at least one processor is The method involves learning an input transformation function for a second machine learning system using a first machine learning system, wherein the learning is based on minimizing the sum of the task loss and the post-activation density loss. The input data is transformed using the learned input transformation function to modify the post-activation density in order to reduce the amount of energy consumed in the inference task, A device operable to perform an operation including using the second machine learning system to perform the inference task on the transformed input data.
15. The apparatus according to claim 14, wherein the second machine learning system is implemented using a neural network.
16. The apparatus according to claim 15, wherein learning the input transformation function includes balancing the task-specific loss in inference time with the post-activation density of the neural network based on specified trade-off factors.
17. The apparatus according to any one of claims 14 to 16, wherein the operation further comprises generating additional input transformation functions by repeatedly learning with different specified tradeoff factors, the modification being performed using selected input transformation functions corresponding to selected tradeoff factors from among the specified tradeoff factors.
18. The apparatus according to claim 17, further comprising the operation of obtaining the selection of the input conversion function.
19. The apparatus according to claim 18, wherein the acquired selection is one of the selected trade-off factors, energy level, energy saving level, one of the input conversion function, and one of the accuracy level.
20. The apparatus according to claim 17, wherein the task loss ensures good accuracy of the inference task, while the post-activation density loss ensures that the post-activation density becomes small enough to obtain reduced energy consumption, and each specified trade-off factor functions to balance the task loss and the post-activation density loss in order to enable the trade-off between accuracy and energy consumption.