Processing system for executing an artificial intelligence application
The processing system optimizes computational load distribution using a reinforcement learning agent to address power and latency issues in wearable devices, ensuring efficient and reliable AI application execution.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- LUXOTTICA SRL
- Filing Date
- 2025-12-18
- Publication Date
- 2026-06-25
AI Technical Summary
Existing wearable devices face challenges in maintaining low power consumption while achieving high computational capability, and network interference and server workload fluctuations affect the reliability and latency of artificial intelligence applications, leading to suboptimal user experiences.
A processing system that dynamically adjusts the computational load distribution among wearable devices, mobile terminals, and cloud or edge servers using a reinforcement learning agent to optimize energy consumption and end-to-end latency, ensuring satisfactory user experience.
The system effectively minimizes energy consumption and latency fluctuations, adapting to network conditions and resource availability to maintain real-time performance of AI applications.
Smart Images

Figure EP2025087905_25062026_PF_FP_ABST
Abstract
Description
[0001] PROCESSING SYSTEM FOR EXECUTING AN ARTIFICIAL INTELLIGENCE APPLICATION
[0002] The present invention refers to a processing system for executing an artificial intelligence application.
[0003] In recent years, Artificial Intelligence (Al) has gained popularity across many fields, enabling the integration of interesting functionalities in various applications. Both academia and industry use Al and extended reality technology to offer users new ways to perceive their environments. Usually, these applications imply executing Al models involving, in many cases, large Deep Neural Networks (DNN). On the other hand, smart device manufacturers have been enhancing the computational and memory capabilities of edge and embedded devices (e. g., smartglasses, smartphones) to enable the execution of Al models. However, significant challenges persist concerning computational power and battery capacity when running complex Al models. Moreover, many applications aim to provide real-time processing, making resource and time constraints critical factors that significantly impact the user experience. Consequently, processing the entire DNN on the device often becomes impractical.
[0004] In the present disclosure the expression " Al application" indicates a software involving the execution of an Al algorithm like for example a DNN. Several studies have tackled these problems by relying on DNN optimization techniques such as quantization, model compression, and knowledge distillation. These techniques can enable the running of the entire DNN on the device but can sometimes lead to a loss in model accuracy. Studies have proposed DNN partitioning to address resources and time constraints challenges. DNN partitioning involves splitting the DNN into different partitions, each suitable for running on different devices. Unfortunately, current solutions mainly focus on a single metric (e. g., energy or latency) and the runtime management is limited to only two devices.
[0005] Nowadays wearable device, like for example smart eyewears, smartwatches, a generic head-mountable device and extended reality headsets are emerging applications for this approach.
[0006] Head-mountable devices encompass any type of device designed to be worn over at least one eye of a wearer and configured to be mounted on a head, including any eyewear.
[0007] The wearable devices usually comprise one or more sensors for detecting environmental parameters, like for example temperature and humidity, and a microcontroller capable of processing the detected parameters and executing software applications.
[0008] Such a processing may involve the execution of an Al application comprising for example a DNN.
[0009] In this application scenario, it is essential to maintain low the power consumption of such a microcontroller that is however a requirement in conflict with the high computational capability above cited.
[0010] For this reason, it is known that wearable devices gather data from their surroundings, for example through suitable sensors, process said data by running a partition of the DNN locally, and offload the remaining computation to a mobile phone and / or to a cloud or edge server. In this case, the wearable device, the mobile phone and the cloud or edge server form a processing system, wherein the wearable device is generally connected to the mobile phone through a wireless connection and the mobile phone is generally connected to the cloud or edge server through a telecommunication network like a 5G network.
[0011] The data transmission between different entities involves the use of different network domains. On one hand, the mobile phone uses a 5G network to communicate with the cloud or edge server, on the other hand, the wearable device uses a Wi-Fi network to communicate with the mobile phone. As previously mentioned, Wi-Fi systems can experience interference when multiple devices operate on the same frequency. This interference is caused by overlapping signals and transmissions, leading to reduced quality and reliability of the wireless connection. This problem is exacerbated in 5G systems where next generation millimeter-wave technology, while promising higher transmission rates, is also characterized by larger variance. Consequently, communication may be subject to fluctuations and disruptions that affect network throughput.
[0012] Furthermore, cloud or edge servers need to handle many incoming requests. Often, they might experience workload fluctuations, resulting in performance variability. These fluctuations affect server queuing times and, consequently, the end-to-end latency of Al applications. The computational load of the Al application execution may be distributed among the wearable device, the mobile phone and the cloud or edge server in order to ensure system constraints and improve user experience. In this case the Al application is divided in three partitions by means of so-called partition points, wherein each part is intended to be executed by the wearable device, the mobile phone and the cloud or edge server respectively. The partition points determine the configuration of the computational load between the components of the processing system.
[0013] Considering different configurations for the same Al application with various partition points, the challenge lies in determining the most appropriate configuration to use in real-time. The selected configuration must ensure a satisfying user experience while adapting to fluctuations in communication conditions and server latency.
[0014] The object of the present invention is to overcome the above-mentioned drawbacks and in particular to devise a processing system for executing an artificial intelligence application optimizing the energy consumption while maintaining end-to-end latency at levels that guarantee a satisfactory user experience. This and other objects according to the present invention are achieved by making a processing system for executing an artificial intelligence application system according to claims 1.
[0015] Further characteristics of the processing system for executing an artificial intelligence application are the objects of the dependent claims.
[0016] The characteristics and advantages of the processing system for executing an artificial intelligence application according to the present invention will be more evident from the following exemplary though nonlimiting description, referring to the attached schematic drawings in which:
[0017] - Figures 1 is a block scheme of an embodiment of a processing system according to the present invention; - Figure 2 is a representation of an example of an operating scenario of the processing system of figure 1; - Figure 3a-3f are schematic representations of six computational load configurations used in the processing system according to the invention.
[0018] With reference to the figures, a processing system for executing an artificial intelligence (Al) application at an execution rate is shown, globally referred to as 100. The Al application is a computer program or software intended to be executed in an application time Lmax. The application time is a predefined requirement of the application and corresponds to the time duration that guarantees a satisfactory user experience. The execution rate is related to the application time and is equal to about 1 / Lmax.
[0019] The processing system 100 comprises a wearable device 110, an electronic mobile terminal 120 connected to the wearable device 110 through a wireless connection 140, and a cloud or edge server 130 connected to the electronic mobile terminal 120 through a telecommunications network 150.
[0020] For example, the wearable device 100 may be an eyewear, a generic head-mountable device, a smartwatch and so on. Head-mountable devices encompass any type of device designed to be worn over at least one eye of a wearer, wherein the device is configured to be mounted on a head, including any eyewear.
[0021] For example, the head-mountable device may be a helmet, or a visor, or a mask or an eyewear and so on.
[0022] In particular, in the illustrated figures the wearable device 100 is an eyewear. For example, the electronic mobile terminal 120 may be a smartphone or a tablet or a laptop and so on.
[0023] For example, the wireless connection 140 may be point-to-point connection like as a Wi-Fi connection, for example according to the WiFi7 standard, or a Bluetooth connection and so on.
[0024] For example, the telecommunications network 150 may be a network for a fee like a 5G network or a 4G network and so on.
[0025] The Al application, for example, comprises a DNN in its turn comprising a plurality of Al tasks, including object classification, object detection, object tracking, and so on.
[0026] For example, the Al application is a vision application or a Simultaneous Localization and Mapping (SLAM) application and so on.
[0027] The Al application is configured to be divided into a plurality of partitions that can be executed by different processing devices.
[0028] In order to ensure a positive user experience, it is necessary to match some time constraints on Al application execution. For example, a SLAM application requires a frame rate within the range of 15-30 frames per second (fps), implying that the total application time Lmax should be approximately 33-66 ms.
[0029] For tracking applications, it is crucial to maintain a minimum application frame rate of 30 fps, translating to an end-to-end application time of less than 33 ms.
[0030] For low-frame rate videos, a rate of 1-5 fps (200-1, 000 ms) is acceptable.
[0031] In any case, the Al application is executed continuously with an execution rate of about 1 / Lmax. If the execution time is greater than the application time Lmaxthe execution rate is less than 1 / Lmax. Then it is important that the execution time does not exceed the application time Lmax. For example, for SLAM application the Al application execution rate must be equalto the frame rate to be processed.
[0032] In the following the references to the cloud server have to be intended as applied also to the edge server.
[0033] The wearable device 110, the electronic mobile terminal 120 and the cloud server 130 comprise a first electronic unit 111, a second electronic unit 121, and a third electronic unit 131 respectively wherein each one of such electronic units 111, 121, 131, is configured for executing at least a partition of an Al application. In this description, with the term "electronic unit" it is intended to indicate at least a processing unit, like a microprocessor, a neural processing unit, a graphic processing unit or even a processing and control unit like a microcontroller.
[0034] The wearable device 110 may comprise also a data acquisition system connected to the first electronic unit 111 and configured to acquire first input data intended to be processed by the Al application.
[0035] The data acquisition system may comprise one or more sensors 112 for detecting environmental parameters or movement parameters and one or more cameras 114 for acquiring data including images and / or videos.
[0036] Moreover, the wearable device 110 comprises a first battery 113 to power supply all the electronic components included therein, that are the data acquisition system and the first electronic unit 111.
[0037] Also, the electronic mobile terminal 120 comprises a second battery 122 to power supply all the electronic components included therein.
[0038] The Al application is intended to process the data detected by the sensors 112 or acquired by the cameras 114; these data will be indicated in the following as first input data.
[0039] When the Al application is divided in three partitions the following operations occur:
[0040] - the first electronic unit 111 executes the first partition of the Al application on the basis of a part of the first input data and sends second input data to the electronic mobile terminal 120, wherein the second input data comprise the result of the execution of the first partition of the Al application and the not used first input data;
[0041] - the second electronic unit 121 executes the second partition of the Al application on the basis of the second input data and sends third input data to the cloud server 130, wherein the third input data comprise the result of the execution of the second partition of the Al application and the not used second input data;
[0042] - the third electronic unit 121 executes the third partition of the Al application on the basis of the third input data received by the electronic mobile terminal 120 and sends the result of the execution of the third partition of the Al application to the electronic mobile terminal that then forwards the result to the wearable device 110.
[0043] The execution scheme above illustrated may be applied also to the case in which the Al application is divided in just two partitions; in this case one of the three processing device is not operating. In any case, the transmission of the result from the last processing device to the wearable device 110 is called final download. The final download is carried out by transmitting the result from the cloud server 130 to the electronic mobile terminal 120 through the telecommunication network and from the electronic mobile terminal 120 to the wearable device 110 through the wireless connection.
[0044] The wearable device 110, the electronic mobile terminal 120 and the cloud server 130 are all configured to transmit between each other the input data.
[0045] Moreover, the wearable device 110 is configured for transmitting wearable device state data to the electronic mobile terminal 120.
[0046] The wearable device state data comprise data indicating the state of the charge of the first battery 113 and the computational load of the first electronic unit 111. The electronic mobile terminal 120 is also configured for collecting the wearable device state data, cloud server state data, electronic mobile terminal state data, wireless connection state data, and telecommunication network state data. The cloud server state data comprise data indicating the execution time of the respective partition of the Al application at the preceding request; such an execution time is a parameter from which the computational load of the third electronic unit 131 may be inferred.
[0047] The electronic mobile terminal state data comprise data indicating the charge level of the second battery 122 and the execution time of the respective partition of the Al application at the preceding request; such an execution time is a parameter from which the computational load of the second electronic unit 121 may be inferred. The wireless connection state data and the telecommunication network state data comprise data indicating the occupancy of the bandwidth of the wireless connection and of the telecommunication network respectively. The wireless connection state data and the telecommunication network state data are, in particular, determined by the second electronic unit of the electronic mobile terminal on the basis of the transmission rates of such connections derived by the data packets flow.
[0048] The state data and input data collection are executed by the electronic mobile terminal 120 periodically with a control period i, for example 5s.
[0049] The processing system is configured so that, at the first execution of the Al application, such an Al application is is executed by the first electronic unit (111), the second electronic unit (121) and the third electronic unit (131) according to a first predefined computational load distribution. In particular, the Al application is divided in three partitions according to a first predefined computational load distribution, wherein the first partition is executed by the first electronic unit 111, the second partition is executed by the second electronic unit 121 and the third partition is executed by the third electronic unit 131.
[0050] For example, the first predefined computational load distribution provides a distribution of the computational load in equal parts among the three different electronic units.
[0051] Advantageously, the second electronic unit 121 is programmed for executing, after the first execution of the Al application, a Reinforcement Learning (RL) agent configured for determining at least one second distribution of the computational load of the Al application among the first electronic unit 111, the second electronic unit 121 and the third electronic unit 131 on the basis of the wearable device state data, the electronic mobile terminal state data, the cloud server state data, wireless connection state and telecommunication network state data so as to minimize the energy consumption of the first battery maintaining an end-to-end latency smaller than the application time. The RL agent is a software module based on an Al algorithm. The execution of the RL agent is initially periodic with a period equal to the control period i. Preferably, the second electronic unit 121 is configured for recording the execution time of the Al application and comparing it with the application time; if the execution time is greater than the application time the second electronic unit 121 records a violation. The second electronic unit 121 is also configured for immediately executing the RL agent if the number of consecutive violations exceeds a predefined threshold value. In this case the RL agent is then executed on demand without respecting the control period i.
[0052] The determination of the at least one second distribution of the computational load of the Al application comprises the steps of determining two partition points and consequently of dividing the Al application into three partitions by means of the determined partition points wherein the first partition is intended to be executed by the first electronic unit 111, the second partition is intended to be executed by the second electronic unit 121 and the third partition is intended to be executed by the third electronic unit 131.
[0053] The first electronic unit 111, the second electronic unit 121 and the third electronic unit 131 are then programmed for executing the Al application according to the at least one second distribution of the computational load determined by the RL agent. Each distribution of computational load is also called as computational load configuration.
[0054] Preferably, the determination of the at least one second distribution of the computational load is carried out so as to minimize also the second battery power consumption. Preferably, the determination of the at least one second distribution of the computational load is carried out so as to minimize also the costs related to the use of the telecommunication network 150 and or the electric energy.
[0055] The RL agent will be described in the following.
[0056] A set of the computing devices of the processing system indices is denoted by D = { 1, 2, 3 }, so that d = 1 denotes the wearable device 110 and so the first electronic unit 111, d = 2 denotes the electronic mobile terminal 120 and so the second electronic unit 121 and d = 3 denotes the cloud server 130 and so the third electronic unit 131. The set of all candidate computational load configurations is denoted with K. Each element k1e K represents a computational load configuration including multiple Al application partitions, defined as k
[0057]
[0058] l= {Pd}deD•
[0059] Figures 3a-3f illustrate representative candidate computational load configurations for the Al application. To simplify the modeling and the identification of the partition that is executed on a specific device, three partitions for all the computational load configurations k1are considered and a new partition
[0060]
[0061] is introduced for indicating that nothing is executed. In this way, it is possible to characterize a priori, which partition runs on which device, and, therefore, establish a one-to-one correspondence between a partition and a device.
[0062] Indeed, for each configuration k1, the first partition Pi is always executed on the wearable device 110, the second partition p2lon the electronic mobile terminal 120, and the third partition p3on the cloud server 130. Figures 3a and 3c show the case of one partition, meaning that the entire Al application is fully executed on the electronic mobile terminal 120 or on the cloud server 130, respectively. Figures 3b, 3e and 3f represent the case of two partitions, which run respectively on the electronic mobile terminal 120 and on cloud server 130, on the wearable device 110 and on the cloud server 130 or on the wearable device 110 and on the electronic mobile terminal 120, while the third computational component remains idle. Finally, Figure 3d illustrates the case of three partitions, executed by all three components. Throughout the Al application execution, the RL agent might switch from one configuration to another; for instance, Figure 3a and Figure 3b might represent the solutions adopted when the wearable device battery level is low. It is assumed that all the partitions fit in the devices' memory since those violating this constraint can be filtered a priori. The wearable device state data, the electronic mobile terminal state data and the cloud server state data are measured with a period equal to the application time Lmax. Unless differently specified, their value, which may vary from one control time interval to the other, will be considered constant within a specific time window. Then, an apex i can be added to the parameter name only to mark its relationship with a specific time slot.
[0063] It is assumed that the Al application receives input data with a frequency A = 1 / Lmax.
[0064] It is defined x1as a binary variable that equals 1 if configuration k1is selected, and it is denoted by | K | the total number of configurations. One single configuration must be chosen in each time slot i, which can be expressed by the following constraint:
[0065] |K|
[0066] £ / = i
[0067]
[0068] i=l
[0069] When a computing device (one among the wearable device 110, electronic mobile terminal 120 and the cloud server 130) does not run the entire Al application or its last layer, it sends an intermediate tensor to the next component wherein such an intermediate tensor is the result of the executon of the respective partition. Such an intermediate tensor together with the not used input data forms the input data for the following computing device, as previously reported.
[0070] With δdkit is denoted the amount of per-request data related to configuration k1that is sent by partition pdlto partition p^; the per-request data is the intermediate tensor size in byte. Thus, δ12is the amount of per-request data sent from the wearable device 110 to the electronic mobile terminal 120 and δ23from the electronic mobile terminal 120 to the cloud server 130. When everything is executed on the wearable device 110 δ12= δ23= 0, while in case everything is offloaded to the cloud server 130 δ12= δ23= δ0, where δ0is the size of the input tensor. The input tensor is the first input data to be processed by the system. For example in the SLAM application the input tensor is a vectorial representation of the raw image acquired by the one or more cameras 114 of the electronic mobile terminal 120.
[0071] The parameter pdl, measured in floating point operations, denotes the per-request computational workload of running the partition pdl. This information can be easily gathered by development frameworks such as the already known PyTorch, TensorFlow, etc. This quantity, as first approximation, determines the energy consumption of the device' s computing resources d when running a partition pdl. Moreover, to each partition pdla partition latency tdlis associated, which defines the processing time of partition pdlwhen running on the associated device. This latency can be computed from a partition latency model that predicts the Al application latency on the specific device or by performing a runtime profiling of the Al application.
[0072] The network latencies are estimated through the access time T, the bandwidth < > and the data size to transfer:
[0073] / dcttci._size\
[0074] ^network [ T H "7 I
[0075]
[0076] where T and < > depend on the network domain we consider (for example WiFi7 or 5G). In particular, TSG depends on the geographic location of the network node that the user wants to access.
[0077] Preferably, for simplicity the access time T of the networks used in the processing system may be considered invariant with respect to the time, even though it may sometimes fluctuate due to the solicitation of the network (for instance, if many users are concurrently accessing the network, they may experience an access time greater than the one they would observe when the network is not saturated).
[0078] Since information about the network access time and bandwidth are not easily accessible in practical applications, it has been advantageously considered the network throughput that characterizes each domain, denoted as rwireless_ connection and rtelecommunications network, respectively.
[0079] For example in the following it is referred to the particular case in which the wireless connection is a WiFi7 connection and the telecommunication network is the 5G network, all the arguments and equations related thereabout being applicable to any network. Then it will be considered the case in which rWireiess connection=rwiFi7 and rtelecommunications network=r5G. Any parameter with subscript "5G" or " WiFi7" may be replaced with the respective parameter of the network and connection available in the particular processing system. Any argument and equation may be applied to any other network and wireless connection, although in the following we will make reference to WiFi7 and 5G.
[0080] The network throughput can be influenced by many factors, among which the network access time, network congestion, and packet loss, and it intervenes in the computation of both the data-transf er time and the energy consumption associated with it. Therefore, the network latencies can be expressed by the following equations:
[0081] data_size dcttci._size
[0082] lwiFi7=~ ’ ^5G=~
[0083]
[0084] > WiFi7 > 5G
[0085] Note that the WiFi7 throughput may be reduced when two users approach each other due to the interference between their two WiFi hotspots. As already mentioned, the network throughput is measured in each time interval i. Thus the throughput has to be related with such a time interval and denoted as rτWiFi7and rτ5Gfor the two network domains. Since we consider it to be constant within the time period i, the apex in the rest of the formulation will be dropped.
[0086] The data transfer latencies lspbetween the wearable device 110 and the electronic mobile terminal 120, and lPc between the electronic mobile terminal 120 and the cloud server 130 are given by:
[0087] £sp „ ’Lpc
[0088]
[0089] > WiFi7r5G
[0090] The cost esc associated with the 5G connection used to offload the computation from the electronic mobile terminal 120 to the cloud server 130 depends on the size <?23 of the data sent from the electronic mobile terminal 120 to the cloud server 130 and on a cost g paid to transfer one byte of data. Hence, a time unit cost CSG, that is for example a cost per second, can be expressed as:
[0091] 1*1
[0092] c5G= Λg δi23xi
[0093]
[0094] i=l On the other hand, the WiFi7 hotspot between the wearable device 110 and the electronic mobile terminal 120 does not require any internet connection, therefore the wireless connection costs has not been considered in this case. However, if the wireless connection involves the use of a pay network the cost of the wireless connection may be considered.
[0095] The end-to-end latency comprises different entity execution times, and the data-transf er latency between those entities. Let lwa and lpbe the wearable device 110 and electronic mobile terminal 120 execution times, respectively, which can be expressed as:
[0096] |K| |K|
[0097]
[0098] i=l i=l
[0099] Similarly, a simple cloud server model has been considered in which the latency and its variance are measured in the control time slots with period i. The total latency on the cloud server 130 is denoted by l^ioud and it can be defined as l
[0100]
[0101] ^ioud=tzxd r where A is the (unknown) workload injected in the cloud server 130 in the time slot i.
[0102] Exploiting the information on the data-transf er latencies Isp and Ipc, the end-to-end latency is given by Itotal=Iwd + Ip + I sp + Ipc + I cloud.
[0103] The end-to-end latency represents the execution time of the Al application.
[0104] The power consumption of the devices includes four components, namely:
[0105] The power consumed by the wearable device 110 and the electronic mobile terminal 120 when running a partition locally, denoted as eexeWd and eexeprespectively, which consider the CPU and Al neural engine / GPU energy consumption models;
[0106] The power consumption of the wireless connection interface (WNI) and of the telecommunication network interface (TNI), for example the 5G network interface (5GI) when uploading the intermediate tensor to the electronic mobile terminal 120 and the cloud server 130, denoted as ewnuand esgu respectively;
[0107] The power consumed by the WNI and 5GI when retrieving the remote execution results denoted as ewna and esga respectively;
[0108] The power consumed by the WNI and 5GI when the wearable device 110 and the electronic mobile terminal 120 are in idle state, denoted as e wnidle and esgidie respectively. The idle state is enabled when the wearable device 110 and the electronic mobile terminal 120 are waiting for the results from the execution of the remote partitions. The power terms eexewa and eexepcan be computed and are given by:
[0109] |K|
[0110] ^exewd '
[0111] i=l
[0112] |K|
[0113] &exep ' A
[0114]
[0115] i = l
[0116] where zWd and zp
[0117]
[0118] the energy consumption for performing one floating point operation on the wearable device 110 and the electronic mobile terminal 120, respectively
[0119]
[0120] and p2are the per-request computational workload
[0121]
[0122] floating point operations) of running the partition p and P2°f configuration k1on the wearable device 110 and the electronic mobile terminal 120, respectively. When the devices offload all or some parts of the computation, they consume the power ewnuand esgu to send the intermediate tensor. Let 0Wd and 0Pdenote the power consumption of the wearable device 110 WNI and the electronic mobile terminal 120 5G interface when transmitting data, respectively. The value of ewnuand esgu are computed as:
[0123] >
[0124] ^wnu "wd '■sp
[0125] rWiFi7
[0126] > n i > ^! = 1 ^p^23Xi
[0127] e5gu — Vplpc ~ „
[0128]
[0129] r5G
[0130] The energy of the final download (eWnd and esgd) and the energy consumed while in the idle state (e wnidle and C 5gidle ) may be neglected.
[0131] In this case, the total power consumption eWd and epcan be calculated as: eWd = e exewd I 6wnu, Cp6exePI e..gi.. The energy consumption in a specific time slot with duration i is computed as follows:
[0132] wd ^wd^> p pT
[0133] As previously mentioned, the RL agent periodically determines the computational load configuration depending on the processing system state. This decision problem is considered as the minimization of the total processing cost as follows:
[0134] min (a (ewd+ep) +c5g) i
[0135] subject to the following constraints:
[0136] 1*1
[0137] 2 = i
[0138] i=l
[0139] 1 tot al CLmax
[0140] xie { 0, 1 }, Vie { l, 2,.„, | K | }
[0141] The goal is to minimize the energy consumption and the 5G connection and energy cost. If the transmission costs are not considered the decision problem may be expressed as min (a (eWd+ep) ).
[0142] If it is considered just the energy consumption of the wearable device 110 the decision problem may be expressed as min (a (ewa) ) •
[0143] In the objective function above, a denotes the energy unit price (measured in currency / J), and Lmax is the maximum time latency that guarantees a satisfactory user experience that is the application time. The problem complexity is 0 ( | K | ) and can be solved in linear time. What makes the problem difficult is the network fluctuation and the cloud latency which make the parameter of the optimization problem time-varying. RL allows to learn the environment dynamics and the time evolution of such parameters. The presence of multiple terms in objective functions still requires careful optimization, although it is linear in terms of the decision variables.
[0144] The decision problem above illustrated has been formulated as a discrete-time infinite-horizon Markov Decision Process (MDP). This can be defined by a 5-tuple {S, A, P, c,y), where S represents the infinite set of all the possible states, A (s) denotes the finite set of all possible actions in state s, P (s' | s, a) is the transition probability from a given state s to a state s' given an action a e A (s), c (s, a, s' ) is the immediate cost when an action a is executed in state s and the system transits to state s', and ye[ 0, 1 ] is a discount factor that adjusts the importance of future costs.
[0145] It may also be defined the RL agent state as: S - ( rwiFi7, r5G, Iwd, Ip, 1 cloud) •
[0146] The RL agent state may comprise less or more parameters with respect to the ones listed above, depending on the required complexity of the reference controlled application.
[0147] Among these, the data rates rWiF±7 and rsc, and the execution time in the cloud server 130 lcioud are exogenous parameters that are not under the agent control. Therefore, their variability is independent on the chosen actions and is simply observed from the environment. Moreover, it is assumed that the electronic mobile terminal 120 works in light-load conditions, i. e., there are no other running tasks that compete for its resources. On the other hand, lWd and lpchange according to the actions, but their value is preset. An action consists in selecting a configuration k1, thus raising to 1 the corresponding variable x1. Therefore, the number of actions equals the number of configurations I K |. Note that, in some states, the RL agent may choose not to change the configuration selected in the previous time window; this scenario is modelled by introducing an action q that represents the do nothing choice. The set of all the possible actions A (s) is hence given by:
[0148] A (s) = { a1, a2,..., a| K |} U {q }
[0149] where each a1represents the action of raising to 1 the variable x1, thus selecting configuration k1.
[0150] It is associated a cost c (s, a, s' ) to each triple stateaction-next state, which embeds the energy costs cewd ( s, a) =aewd and cep(s, a) =aep, the 5G connection cost C5c (s, a), and the penalty ciat ( s, a, s ' ) that is paid when violating the maximum latency requirement ( ltotai< Lmax). In a given time slot, if the RL agent chooses an action different from r|, the system must be reconfigured, which implies some overhead. Therefore, it is introduced a reconfiguration penalty crCfg, defined as crCfg=l if aArp meaning that it is paid a penalty equal to 1 every time the chosen action is different from rp 0 otherwise. Similarly, the cost Ciat (s, a, s' ) incurred when the system violates the latency constraint can be simply defined as Ciat (s, a, s' ) =l if 1 total Lmax •
[0151] The cost c (s, a, s' ) is given by:
[0152] z / acewdcepC5G CpS, Cl, S J ^eivd ^ep ^5G ^lat^lat
[0153]
[0154] Cewd, max C ep, max ^-5G,max + GJrcf g -rcf g
[0155] where weWd, wep, MSG, Wiat and wrCfg are non-negative weights, and they sum up to 1, ceWd, max < Cep, max and C5G, max are the normalization parameters for the energy cost on the wearable device 110 and the electronic mobile terminal 120 and the cost of the 5G connection, respectively. When there is a high battery level, the RL agent should promote the local computation on the wearable device 110, while, when the battery level drops, the RL agent should prioritize the computation offloading to the electronic mobile terminal 120 or the cloud server 130. Preferably, the RL agent is implemented through one of value-based algorithm, policy-based algorithms, and model-based algorithms, Actor-Critic-based algorithms. Value-based algorithms directly learn the state value or state-action value and act greedily by choosing the best action in the state. They use exploration to understand the environment and converge to optimal or near-optimal solutions. Policy-based algorithms learn a stochastic policy mapping the state and actions. The agent acts by sampling action from the learned policy. Model-based algorithms learn the model of the environment and plan actions according to the learned model. Actor-Criticbased algorithms combine aspects of both policy-based methods (Actor) and value-based methods (Critic). Actor decides which action to take given the current state and updates the policy directly by using the feedback from the critic. Critic evaluates the action taken by the actor by computing a value function, such as the expected reward or value of the state-action pair and provides feedback to the actor on how good or bad the action was. The actor updates its policy based on the critic evaluation to improve the action selection over time. This method benefits from the actor ' s ability to directly improve policy and the critic role in reducing the variance of policy updates, leading to more stable and efficient learning.
[0156] Preferably, the value-based algorithms comprise the following algorithms: DQN (Deep Q Learning), DDQN (Double Deep Q Learning), DQN+ATT (Deep Q Learning with attention), Rainbow.
[0157] Preferably, the Actor-Critic-based algorithms comprise the following algorithms: A2C (Advantage Actor Critic), PPO (Proximal Policy Optimization).
[0158] More preferably, the RL agent is implemented through an Actor-Critic-based algorithm.
[0159] From the description made, the characteristics of the processing system for executing an artificial intelligence application at an execution rate, that are object of the present invention, are clear, as are the relative advantages.
[0160] Indeed, the processing system after the first execution of the Al application distributed among its computing devices is capable to modify the computational load distribution for the successive execution so as to optimize the energy consumption, the energy cost and also eventually the transmission cost still assuring the limit of the application time. This allows to adapt the distribution of the computational load on the basis of the particular requirement of the application in terms of application time and of the actual conditions in terms of resource occupancy level of the computing devices. Finally, it is clear that the processing system for executing an artificial intelligence application at an execution rate thus conceived is susceptible to numerous modifications and variations, all of which are within the scope of the invention; moreover, all the details may be replaced by technically equivalent elements. In practice, the materials used, as well as their dimensions, can be of any type according to the technical requirements.
Claims
CLAIMS1) Processing system (100) for executing an artificial intelligence or Al application at an execution rate related to an application time, said processing system comprising:- a wearable device (110) comprising a first electronic unit (111) and a first battery (113);- an electronic mobile terminal (120) connected to the wearable device (110) through a wireless connection, said electronic mobile terminal (120) comprising a second electronic unit (121) and a second battery ( 122);a cloud or edge server (130) connected to the electronic mobile terminal (120) through a telecommunications network (150), said server (130) comprising a third electronic unit 121;wherein- the processing system is configured so that, at the first request of execution of the Al application, the Al application is executed by the first electronic unit (111), the second electronic unit (121) and the third electronic unit (131) according to a first predefined computational load distribution;- the second electronic unit (121) is programmed for executing, after the first execution of the Al application, a Reinforcement Learning (RL) agent configured for determining at least one second distribution of the computational load on the basis of wearable device state data, electronic mobile terminal state data, server state data, wireless connection state and telecommunication network state data so as to minimize the energy consumption of the first battery maintaining an end-to-end latency smaller than theapplication time, the first electronic unit (111 ), the second electronic unit (121) and the third electronic unit (131) being programmed for executing the Al application according to the at least one second distribution of the computational load.2) Processing system (100) according to claim 1 wherein the wearable device (110) comprises a data acquisition system connected to the first electronic unit (111) which is configured to acquire first input data intended to be processed by the Al application.3) Processing system (100) according to claim 1 or 2 wherein the wearable device (110) is configured for transmitting wearable device state data to the electronic mobile terminal (120), the electronic mobile terminal (120) is configured for collecting wearable device state data, server state data, electronic mobile terminal state data, wireless connection state data, and telecommunication network state data, the data collection occurring periodically with a control period T.4) Processing system (100) according to one or more of the previous claims wherein the determination of the at least one second distribution of the computational load is carried out so as to minimize also the second battery power consumption.5) Processing system (100) according to one or more of the previous claims wherein the determination of the at least one second distribution of the computational load is carried out so as to minimize also the costs related to the use of the telecommunication network (150) and of the electric energy.6) Processing system (100) according to one or more ofthe previous claims wherein the execution of the RL agent is initially periodic with a period equal to the control period i.7) Processing system (100) according to claim 6 wherein the second electronic unit (121) is configured for recording the execution time of the Al application and comparing it with the application time, if the execution time is greater than the application time the second electronic unit (121) recording a violation, the second electronic unit (121) being also configured for immediately executing the RL agent if the number of consecutive violations exceeds a predefined threshold value.8) Processing system (100) according to one or more of the previous claims wherein the RL agent is implemented through one of value-based algorithms, policy-based algorithms, model-based algorithms, Actor-Critic-based algorithms.9) Processing system (100) according to claim 8 wherein the value-based algorithms comprise the following algorithms: DQN, DDQN, DQN+ATT, Rainbow and the Actor-Critic-based algorithms comprise the following algorithms: A2C, PPO.10) Processing system (100) according to claim 9 or 9 wherein the RL agent is implemented through an Actor-Critic-based algorithm.