Training node and method for distributed and / or federated ai / ML model training in a communication network

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By implementing a training node that dynamically selects clients based on resource profiles and convergence criteria within the 5G core network's Model D architecture, the energy efficiency and model convergence issues in federated learning are addressed, achieving significant reductions in energy use and improved training performance.

WO2026142492A1PCT designated stage Publication Date: 2026-07-02TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
Filing Date: 2024-12-27
Publication Date: 2026-07-02

Application Information

Patent Timeline

27 Dec 2024

Application

02 Jul 2026

Publication

WO2026142492A1

IPC: G06N3/098; G06N3/045; G06N20/00; H04L41/14; H04L41/16

AI Tagging

Technology Topics

Engineering Artificial intelligence

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Texitile light ageing test instrument
CN1588059Acompact structure Easy to assemble and disassemble Material analysis by optical meansTextile testingEngineering Light filter
Multi-dimensional training method and device of support vector machine
CN114186620AImprove linear separabilityimprove classificationKernel methods Character and pattern recognition Data setDescent algorithm
Loop structure of cold heat flows
CN1916533AImprove efficiencySimple configurationFluid circulation arrangementHeating and refrigeration combinationsHeat flow Working fluid
Environment-friendly mobile collecting box for decoration cutting dust
CN108636005AThe dragging process is smoothavoid secondary flyingUsing liquid separation agent Working accessories EngineeringSediment
Credit text analysis method, credit object auditing method and credit object auditing device
CN114386430AReduce labor costs Improve efficiency Finance Semantic analysisCredit cardEngineering

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Federated learning in wireless communication networks faces challenges such as high energy consumption, longer training times, and biased model convergence due to heterogeneous worker capabilities and non-IID data distribution, which are not efficiently addressed by current 5G core network architectures.

Method used

The introduction of a training node that calculates training scores for clients, updates dictionaries based on these scores, and selectively chooses clients for further training or stops training based on convergence, utilizing the 5G communication model 'Model D' and SCP functionalities to manage resource profiles and client selection dynamically.

Benefits of technology

This approach reduces energy consumption by up to 50% and improves model convergence by aligning with sustainability goals, while mitigating stragglers and bias, ensuring efficient and parallel training of AI/ML models across heterogeneous clients.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure SE2024051167_02072026_PF_FP_ABST

Patent Text Reader

Abstract

A method for handling training of a Machine Learning, ML, model in a communication network is provided. A respective training score is calculated for one or more first clients from a set of clients. The one or more first clients have performed training of the ML model. A set of dictionaries is updated based on the respective training scores. One or more subsequent clients is selected from the set of clients based on the set of dictionaries. Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more clients, not achieving convergence, further training of the ML model by the one or more subsequent clients is initiated. Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, training the ML model is stopped.

Need to check novelty before this filing date? Find Prior Art

Description

[0001] TRAINING NODE AND METHOD FOR DISTRIBUTED AND / OR FEDERATED AI / ML MODEL TRAINING IN A COMMUNUCATION NETWORK

[0002] TECHNICAL FIELD

[0003] Embodiments herein relate to a training node and method therein. In some aspects, they relate to handling distributed and / or federated AI / ML model training in wireless communication network.

[0004] BACKGROUND

[0005] In a typical wireless communication network, wireless devices, also known as wireless communication devices, mobile stations, stations (STA) and / or User Equipment (UE), communicate via a Wide Area Network or a Local Area Network such as a Wi-Fi network or a cellular network comprising a Radio Access Network (RAN) part and a Core Network (CN) part. The RAN covers a geographical area which is divided into service areas or cell areas, which may also be referred to as a beam or a beam group, with each service area or cell area being served by a radio network node such as a radio access node e.g., a Wi-Fi access point, a Base Station (BS) or a radio base station (RBS), which in some networks may also be denoted, for example, a Base Station (BS), a NodeB, eNodeB (eNB), or gNodeB (gNB) as denoted in Fifth Generation (5G) telecommunications. A service area or cell area is a geographical area where radio coverage is provided by the radio network node. The radio network node communicates over an air interface operating on a radio frequency with the wireless devices within the range of the radio network node.

[0006] 3rd Generation Partnership Project (3GPP) is the standardization body for specifying the standards for the cellular system evolution, e.g., including 3G, 4G, 5G and the future evolutions. Specifications for Evolved Universal Terrestrial Radio Access (E-UTRA) and Evolved Packet System (EPS) have been completed within the 3GPP. In 4G also called a Fourth Generation (4G) network, EPS is core network and E-UTRA is radio access network. In 5G, 5GC is core network, NR is radio access network. As a continued network evolution, the new release of 3GPP specifies a 5G network also referred to as 5G New Radio (NR) and 5G Core (5GC).

[0007] Frequency bands for 5G NR are being separated into two different frequency ranges, Frequency Range 1 (FR1) and Frequency Range 2 (FR2). FR1 comprises sub-6GHz frequency bands. Some of these bands are bands traditionally used by legacy standards but have been extended to cover potential new spectrum offerings from 410 MHz to 7125 MHz. FR2 comprises frequency bands from 24.25 GHz to 52.6 GHz. Bands in this millimeter wave range have shorter range but higher available bandwidth than bands in the FR1.

[0008] Multi-antenna techniques may significantly increase the data rates and reliability of a wireless communication system. For a wireless connection between a single user, such as UE, and a base station (BS), the performance is in particular improved if both the transmitter and the receiver are equipped with multiple antennas, which results in a Multiple-Input Multiple-Output (MIMO) communication channel. This may be referred to as Single-User (SU)-MIMO. In the scenario where MIMO techniques is used for the wireless connection between multiple users and the base station, MIMO enables the users to communicate with the base station simultaneously using the same time-frequency resources by spatially separating the users, which increases further the cell capacity. This may be referred to as Multi-User (MU)-MIMO. Note that MU-MIMO may benefit when each UE only has one antenna. The cell capacity can be increased linearly with respect to the number of antennas at the BS side. Due to that, more and more antennas are employed in BS. Such systems and / or related techniques are commonly referred to as massive MIMO.

[0009] 5G System, Network Repository Function (NRF) and Service Communication Proxy (SCP)

[0010] The 5G core network system is implemented as software to improve performance with a low cost of production. 5G network architecture is implemented in a Service Based Architecture (SBA) framework where individual elements are defined as Network Function (NFs) instead of Network entities, see Figure 1. 5G core allows the possibility to quickly introduce new network functionalities, growing number of connections and software updates. Through Service Based Interface (SBI), each of the NFs consumes services offered by other service producers and other NFs. RESTful APIs are used on 5G SBA which uses HTTP / 2 as application layer protocol. The SBA make it easier to support a wide range functions and for operators to evolve to more sophisticated network.

[0011] The Network Repository Function (NRF) is a key element of the 5G Core network Service Based Architecture (SBA). It performs as a register or catalog that can be consulted by other network functions (NFs), so they can register, discover information of other entities present in the core network, as well as service capabilities that is required.The NRF maintains an updated repository of all 5G elements in the network. NRF has four main tasks executed based on the Management Services and Discovery Services:

[0012] Network function Management (NRF): The NRF allows each NF to register with the NRFs. The NRF maintains the profile of available NF instance and their supported services.

[0013] Network function discovery: The discovery mechanism allows 5G network NFs to discover each other and get update status of the desired elements or NFs. It is important to understand the capabilities, also from one NRFs to other NRFs in different networks.

[0014] - Authorization: Not any entity can request information from the NRF. Strict authorization processes are implemented in NRF to respect security and integrity. Several RFCs have been adopted for this purpose. The basic principles are based on the request for, and exchange of tokens.

[0015] Bootstrapping: The NRF can notify associated NFs of service endpoint that are supported, to avoid storing data in individual NFs.

[0016] Some other roles listed in the following are also handled by the NRF: Data source forSCP routing rules, NF screening, overload handling, Notification Throttling, Interface protecting, TLS. API Gateway and GEO redundancy.

[0017] The 5G core network has introduced a new node called Service Communication Proxy (SCP). The SCP forwards requests to destination based on NRF interaction. SCP includes many functionalities such as:

[0018] Indirect communication,

[0019] Delegated discovery and selection,

[0020] Message forwarding and routing to destination NF / NF service,

[0021] Message forwarding and routing to a next hop SCP,

[0022] Communication security, load balancing, monitoring, overload control etc.

[0023] Communication Model D is a new indirect mode of communication where the service producer and consumer interact via the SCP that mediates messages between them, see Figure 2 and Figure 3. Figure 2 shows the 5G core Model D communication Model. In Figure 3 internal process of the SCP and NRF is depicted. After registration and authorization of the NF granted, the NRF exposes this NF profile to the SCP for routing purpose. The SCP can be considered as a 5G service mesh. As NRF is part of signaling domain, also acts as a data source for SCP, which builds routing rules for NRF-supplied NF status.In Model D, consumer NFs send requests to the SCP. The SCP identifies the right producer via the NRF and returns the requested information. This model simplifies the NF, making easy to add / swap / change NFs and to scale the 5G core as service requests increase.

[0024] SCP has the following roles:

[0025] Routing Control, Load balancing

[0026] Congestion Control

[0027] Traffic Prioritization

[0028] Mediation

[0029] Encrypted HTTPS Monitoring.

[0030] The Network Data Analytic Function (NWDAF)

[0031] As 5G Network is designed to handle various types of services and applications, the management has thus become challenging and complicated. Network automation and intelligence technologies to analyze collected data and automatically optimize complex network management based on AI / ML models are highly required. The 3GPP has defined the Network Data Analytic Function (NWDAF). NWDAF is a new 3GPP standalone network function that provides Real-Time operation intelligence in the 5G core network.

[0032] The 5G system architecture allows NWDAF containing Analytics Logical Function (AnLF) to use trained AI / ML model, provisioning service from another NWDAF containing Model Training Logical Function (MTLF). The AnLF in NWDAF which performs inference, derives analytics information and expose analytic services. The logical function MTLF in NWDAF, which trains AI / ML models and exposes new training services.

[0033] Horizontal Federated Machine Learning (H-FL)

[0034] Federated Learning enables AI / ML model training at the network nodes by exploiting large scale distributed data and computer resources. Federated learning also restricts explicit data sharing so that confidentiality and privacy associated with the use case are preserved. FL differs from classical AI / ML in four main domains: data privacy, e.g., no end-user data leaves the device, worker, node or client, data distribution, e.g., data could be HD or no-IID, continual learning, e.g., the communication time between client and central server may be too long to provide a satisfactory user experience, and aggregation of data, e.g., some privacy notions and rules are violated when user data aggregation occurs in the central server.Federated Learning will require % devices to upload and aggregate parameters iteratively to train the global model. In such a scenario, distributed devices, e.g., mobile devices, workers collaborate, to train a common AI / ML model under the coordination of an access point (AP) or parameter server.

[0035] FL occurs over multiple communication (encapsulated into upload cost and download cost) and computation rounds. In each training round, five-stages process is repeated until model convergence, as shown in Figure 4. Figure 4 shows typical Federated Learning Architecture with three workers A, B and C.

[0036] Step 1: The FL starts when a training task is created by the sever (coordinator) who initializes the parameters of the global model (<yt) and sends to each worker (client or participant), over first download cost.

[0037] Step 2: Each worker k e % (participants) independently completes training on its local dataset to minimize the loss on their local data distribution 2)feof size rjk. This is performed by computing the average gradient gfewith the current model )tusing one or more steps of stochastic gradient descent (SGD). The local worker model update )^+1 is given by EQ 1, with the learning rate e:

[0038] "

[0039]

[0040] t+i = ^t ~ e - gk(1)

[0041] Step 3: Each worker submits its local model

[0042]

[0043] to the server (coordinator) over upload cost.

[0044] Step 4: The global model is consolidated by aggregating local models received from workers by the server:

[0045] M.. _ y / < Vk'^t+i

[0046]

[0047] t+i — 2jk=iN(2)

[0048] Step 5: The global model is then dispatched back to workers over second download cost. This updated global model will be used by each worker in the next training round.

[0049] To achieve the goal in Step 2, FL train a global model that minimizes the average loss across parties, which is expressed as:

[0050] M

[0051]

[0052] = mine%k=i VkW' T^k) (3)

[0053] Where N is the sum of all local training dataset size.

[0054] In subsequent training iteration, Step 2, Step 3, Step 4, and Step5 are the single round of FL, the process is repeated until the training loss converges or the model converges, or a time limit is exceeded, or the maximum number of iterations is reached.This approach stands in contrast to traditional centralized AI / ML technique where all the local datasets are uploaded to one server, thus allowing to address critical issues such as data privacy, data security and data access rights.

[0055] HFL among multiple NWDAFs

[0056] If multiple NWDAF instances are deployed, an NWDAF can acts as an aggregate point (Aggregator NWDAF or NWDAF server) and collects analytics information from other NWDAFs, which may have different serving areas, to produce the aggregated analytics, per Analytics ID. In such, NWDAF may be equipped with AnLF and / or MTLF, allowing a multiple deployment. The multiple deployment is a perfect framework for horizontal federated leaning (HFL). This H-FL among multiple NWDAFs is an AI / ML technique in core network that trains an AI / ML model across multiple decentralized NWDAF entities, holding local dataset, without exchanging / sharing local dataset, but sharing same feature spaces.

[0057] For FL supported by multiple NWDAFs containing Mode Training Logical Function (MTLF) and Analytics Logic Function (AnLF), there is one NWDAF containing MTLF acting as FL server (server NWDAF), and multiple NWDAF containing MTLF acting as FL client or worker (client NWDAF). MTLF allows NWDAF to train AI / ML model and AnLF allows NWDAF to provide analytic results based on the trained model. The main functions of this FML, like steps in section above includes:

[0058] 1) FL server NWDAF

[0059] Discovers and selects FL clients NWDAF to participate in an FML procedure, see [5], clause 5.2,

[0060] Requests FL clients NWDAF to perform local training and report local model information.

[0061] Generate global AI / ML model by aggregating local model information from FL client NWDAFs

[0062] Sends global AI / ML model back to FL client NWDAFs and repeats training iteration until the global model converges.

[0063] 2) FL client NWDAFs

[0064] Locally train ML model that is instructed by the server NWDAF with the available local dataset, which includes the data that is not allowed to share with others.

[0065] Report trained AI / ML model information to the FL server NWDAFReceive the updated global AI / ML model from FL server NWDAF and repeat training iterations if needed.

[0066] The sequence diagram of NWDAF-based FL is depicted in Figure 5. As shown in Figure 5, The FL has three main phases. A discovery and Selection phase (steps 0, step 1) and the effective FML process as a recursive (repeat) process until mode convergence (step 2 to step 8).

[0067] In a multiple NWDAFs deployment scenario, the server NWDAF has the “Analytic Aggregation Capability” registered in its “NF profile” within the NRF. The server NWDAF supports the requesting an exchange of “Analytic Metadata Information” between NWDAFs. The server NWDAF also supports data set statistical properties, output strategy and data time window parameters per type of analytics.

[0068] The Network repository function stores the NF Profiles of NWDAF instances, including “Analytic Aggregator Capability” for Aggregator or server NWDAF, and “Analytics Metadata Provisioning Capability” when supported by the NWDAF. As per request by the server NWDAF, NRF returns the NWDAFs matching the attributes provided in the Nnrf_NFDiscovery_Request.

[0069] SUMMARY

[0070] As a part of developing embodiments herein a problem was identified by the inventors and will first be discussed.

[0071] FL appears as a revolutionary approach in distributed AI / ML paradigm from a data privacy perspective and lack of training data. However, FL has many architectural design challenges, especially during interaction between the NWDAF set as central server and the NWDAFs considered as workers, in each communication rounds, see [5], FL communication overhead or cost is a considerable challenge. Moreover, every data exchange involves energy consumption and large energy bills could be spent for communication and computation. In some cases, computations lead to a considerable amount of carbon footprint without significantly improving the model accuracy over consecutive training iterations. Therefore, FL among multiples NWDAFs has many drawbacks:

[0072] Energy efficiency (communication and computation efficiency) during FL of AI / ML model training. Large communication rounds between the central server and thelocal workers consume high amount of energy. It can also lead to longer training time (during synchronous FL), especially in large scale FL scenarios.

[0073] High availability of AI / ML model. There is no capability inside the 5GC to train multiples models in parallel under FL.

[0074] Statistical and System heterogeneity: Difference in storage, communication and computation capabilities is a big challenge and may lead to biased training. Multiple variations of data present across workers might cause problems in the data structuring, modeling and inferencing phase.

[0075] Embodiments presented in this disclosure aim to address the challenges presented above. Currently, the energy footprint of AI / ML training has been analyzed in various setups. Regardless of whether the network architecture is server-client or decentralized with consensus methods, distributed learning requires many communication rounds for convergence.

[0076] An object of embodiments herein is to improve the performance in a wireless communication network.

[0077] According to an aspect of embodiments herein, the object is achieved by a method for handling training of a Machine Learning, ML, model in a communication network.

[0078] The training node calculates a respective training score for one or more first clients from a set of clients. The one or more first clients have performed training of the ML model.

[0079] The training node updates a set of dictionaries based on the respective training scores.

[0080] The training node selects one or more subsequent clients from the set of clients based on the set of dictionaries.

[0081] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more clients, not achieving convergence, the training node initiates further training of the ML model by the one or more subsequent clients, or

[0082] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, the training node stops training the ML model.According to an aspect of embodiments herein, the object is achieved by a training node configured to handle training of a Machine Learning, ML, model in a communication network.

[0083] The training node is configured to calculate a respective training score for one or more first clients from a set of clients. The one or more first clients have performed training of the ML model.

[0084] The training node is configured to update a set of dictionaries based on the respective training scores.

[0085] The training node is configured to select one or more subsequent clients from the set of clients based on the set of dictionaries.

[0086] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more clients, not achieving convergence, the training node is configured to initiate further training of the ML model by the one or more subsequent clients.

[0087] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, The training node is configured to stop training the ML model.

[0088] BRIEF DESCRIPTION OF THE DRAWINGS

[0089] Examples of embodiments herein are described in more detail with reference to attached drawings in which:

[0090] Figure 1 illustrates an example according to prior art.

[0091] Figure 2 illustrates an example according to prior art.

[0092] Figure 3 illustrates an example according to prior art.

[0093] Figure 4 illustrates an example according to prior art.

[0094] Figure 5 illustrates an example according to prior art.

[0095] Figure 6 is a schematic block diagram illustrating embodiments of a communication network.

[0096] Figure 7 is a flowchart depicting embodiments of a method in a training node.

[0097] Figure 8 illustrates an example according to embodiments herein.

[0098] Figure 9 illustrates an example according to embodiments herein.

[0099] Figure 10 illustrates an example according to embodiments herein.

[0100] Figure 11a illustrates an example according to embodiments herein.Figure 11b illustrates an example according to embodiments herein.

[0101] Figure 12 illustrates an example according to embodiments herein.

[0102] Figure 13 is a schematic block diagram illustrating embodiments of a training node. Figure 14 is a schematic block diagram illustrating embodiments of a client selection function.

[0103] Figure 15 is a schematic block diagram illustrating embodiments of a client profile function.

[0104] Figure 16 is a schematic block diagram illustrating embodiments of a client resource function.

[0105] Figure 17 shows an example of a communication system QQ100 in accordance with some embodiments.

[0106] Figure 18 shows a UE QQ200 in accordance with some embodiments.

[0107] Figure 19 shows a network node QQ300 in accordance with some embodiments.

[0108] Figure 20 is a block diagram illustrating a virtualization environment QQ400 in which functions implemented by some embodiments may be virtualized.

[0109] DETAILED DESCRIPTION

[0110] Embodiments herein relate to AI / ML model training in communication network.

[0111] FL is known to be less efficient for heterogeneous workers that have different computational capabilities especially with non HD data distribution. The local training could become a bottleneck. Embodiments herein relates to the scenario where multiples NWDAF, such as server / workers or clients, collaborate to train an AI / ML model in a Horizontal FL inside the 5G core network.

[0112] Embodiments herein exploits the 5G communication model “Model D”, the SBI and SBA to introduce the following new functionalities to enable sustainable FL model training:

[0113] - A Virtual and Dynamic NF profile (for client NWDAFs) DRP4FL (Dynamic Resource Profile for FL): This register is now located into the SCP to store dynamic information of workers resources. The NF (client NWDAFs) is instantiated during the global selection (First Layer Selection) of client NWDAFs participating to the FL.

[0114] - A Configurable AI / ML Model deployed at the SCP AICS4FL (Al based Clients Selection for FL in each training round). This model is trained offline and deployed to select the clients NWDAFs, based on their specific characteristics and behaviorduring the training. The strategy is to model a “draw with penalized replacement” of the workers involving in the training in each iteration.

[0115] Dynamic and Second Layer Selection of server NWDAFs in each training iteration after inference of the AI / ML model deployed in the SCP.

[0116] The Resource Information for FL (RI4FL) data base, this module will host the resource evolution of the NF participating to the FL.

[0117] Distributed and Parallelized capability of the SCP to support multiple and simultaneous FL mode training.

[0118] Simulations results have shown the effectiveness and efficacity our proposal with carbon emission reduction of more than 50% when the number of NWDAF clients exceeds the number of ten.

[0119] The NRF supports discovery mechanism that allows 5G elements to discover each other and get update status of the desired elements, embodiments herein may promote distributed AI / ML model training, considering sustainability and efficiency in a HFL inside the 5G core network. Embodiments herien updates and improves the current 5GC architecture to reduce the carbon footprint by:

[0120] Extending the capabilities of the SCP to allow the management of NWDAFs clients selection in each training iteration.

[0121] Extending the network function and especially the NWDAF profile to the Resource Information for FL (RI4FL) in SCP. This updated version is integrated to dynamically adjust the heartbeat of the NWDAFs clients in the DRP4FL. The management of this functionality is handled by the SCP supplied by the new Al based Client Selection for FL (AICS4FL). This AI / ML model is used to classify in each training round the NWDAFs clients.

[0122] - Allowing two Layers selection of NWDAF clients. The existing first layer identifies all the available NWDAFs clients ready to participate to the training. The new second layer evolves sequentially in each training round to select a subset of NWDAFs clients, based on data collected by the RI4FL.

[0123] Enabling parallel and simultaneous capability of the SCP to handle multiple training instances. The new proposed functionalities and architecture can support more than two FL training instances. Such approach is suitable for efficiency and prompt model training and deployment when it is needed.According to embodiments herein, heterogeneous NWDAFs clients in a horizontal FML is assumed. Dataset of participants could be HD or non-IDD. The central coordinator NWDAF server can initializes the training following the steps in section. Embodiments herein may bring any one or more of the following advantages:

[0124] Effective Reduction of Energy Consumption: Embodiments herein allows only selection of reliable workers, in term of their contribution to the learning process of the global model. Non selected workers are maintained Idle. More than 35% of global energy is saved with 4 clients, compared to process without smart selection of participants in each training round. The proposed solution is a path toward sustainable Al. The proposed solution is suitable for real-time applications or real-time continuous training. The energy saved can increase up to 50% as the number of participants becomes large.

[0125] Alignment with International Telecommunication Union (ITU)’s objective: The goal of reducing Information and Communication Technology (ICT) carbon emission by over 45% by 2030. Energy efficiency solution is mandatory with the constrain of maintaining the performance.

[0126] Online Decision as Plug-and-Play, Real-Time solution: Embodiments herein brings a more practical solution, where the NWDAF server treats the performance of workers as black box. As soon the FL starts, the NWDAF server can start to work with our solution and figure out the workers capability and reliability by the intelligence in SCP -AICS4FL. The SCP-RI4FL is used to consolidate the workers attributes or health details as the training evolves.

[0127] Stragglers Mitigation in Distributed Computing: The proposed solution allows in each training iteration intelligent workers (NWDAF clients) selection to efficiently deal with communication bottleneck, system disturbances and node failures in distributed AI / ML training. A more robust training process is obtained as the solution can handle non-IID data distribution stored at the workers.

[0128] Bias and Model Drift: The proposed solution is a great tool to reduce the bias. Workers or participants are selected such as to not favor data from a particular participant. A penalization factor is also used to avoid implicit selection of same worker over training iterations. The final model captures contribution of all the workers.

[0129] Solution for Non-IID Data Distribution: In real life, most datasets follow non-IID distribution. It is very challenging to achieve best model convergence and on time with such distribution. Our solution handles non-IID data distribution by exploiting health of NWDAF clients collected at the SCP. The SCP uses our new Al model (AICS4FL) todynamically select NWDAFs clients, without totally rely only technical reliability, but also contribution to the global model.

[0130] Cloud based and Network Operator Implementation as a service: The proposed framework can be generalized to various types of collaborative computing applications as a service. A subscriber of such a service will see its master node assisted by the framework, especially for latency critical applications. Network operator can also benefit from this proposal to reduce the carbon footprint during training of AI / ML. This proposed framework could be developed as an rApp.

[0131] Sustainable and privacy-aware CSP Collaboration: Multiple CSP can collaborate to train more reliable AI / ML model while maintaining the cost as low as possible with our solution. The data does not move, and privacy is guaranteed. CSP could then obtained more general model without exposing their local data.

[0132] Generalizable: The proposed solution can be integrated into any FL scheme. A broker such as MQTT could be implemented to collect health score of clients, then consumed by the server to perform the selection.

[0133] Multi-instances of FL training capabilities: The proposed solution can handle multiple FL instances. Such capability will improve AI / ML model readiness and efficiency in operation and improved performance.

[0134] Embodiments herein relate to communication networks in general. Figure 6 is a schematic overview depicting a wireless communication network 100. The wireless communication network 100 comprises one or more RANs and one or more CNs. The wireless communication network 100 may use a number of different technologies, such as Wi-Fi, Long Term Evolution (LTE), LTE-Advanced, 5G, New Radio (NR), Wideband Code Division Multiple Access (WCDMA), Global System for Mobile communications / enhanced Data rate for GSM Evolution (GSM / EDGE), Worldwide Interoperability for Microwave Access (WiMax), or Ultra Mobile Broadband (UMB), just to mention a few possible implementations. Embodiments herein relate to recent technology trends that are of particular interest in a 5G context, however, embodiments are also applicable in further development of the existing wireless communication systems such as e.g. WCDMA and LTE, or to future technologies such as 6G.

[0135] A number of network nodes operate in the communication network 100 such as e.g. network node 111. These nodes provide radio coverage in a number of cells which may also be referred to as a beam or a beam group of beams.The network node 111 may be any of a NG-RAN node, a transmission and reception point e.g. a base station, a radio access network node such as a Wireless Local Area Network (WLAN) access point or an Access Point Station (AP STA), an access controller, a base station, e.g. a radio base station such as a NodeB, an evolved Node B (eNB, eNode B), a gNB, a base transceiver station, a radio remote unit, an Access Point Base Station, a base station router, a transmission arrangement of a radio base station, a stand-alone access point, a network controlled repeater or any other network unit capable of communicating with a wireless device within the service area served by the network node 111 depending e.g. on the first radio access technology and terminology used. The network node 111 may be referred to as a serving radio network node and communicates with a UE 121 with Downlink (DL) transmissions to the UE 121 and Uplink (UL) transmissions from the UE 121.

[0136] In the communication network 100, one or more wireless devices operate, such as e.g. a UE 121. The UE 121 may also be referred to as a, a device, an IoT device, a mobile station, a non-access point (non-AP) STA, a STA, a user equipment and / or a wireless terminals, communicate via one or more Access Networks (AN), e.g. RAN, to one or more core networks (CN). It should be understood by the skilled in the art that “wireless device” is a non-limiting term which means any terminal, wireless communication terminal, user equipment, Machine Type Communication (MTC) device, Device to Device (D2D) terminal, or node e.g. smart phone, laptop, mobile phone, sensor, relay, mobile tablets or even a small base station communicating within a cell.

[0137] The communication network 100 further comprises a training node 130. The training node 130 may e.g., comprise an SCP 130, such as an SCP -FL 130. The training node 130 may handle training of ML models in the communication network 100. The training node 130 may e.g., comprise, or be connected to, a client selection function 131, a client profile function 132 and / or a client resource function 133. The client selection function 131 may e.g., comprise an Al based Client Selection for FL (AICS4FL) 131. The client profile function 132 may e.g., comprise a Dynamic Resource Profile for FL (DRP4FL) 132. The client resource function 133 may e.g., comprise a Resource Information for FL (RI4FL) 133.

[0138] Methods herein may be performed by the training node 130, the client selection function 131, the client profile function 132 and / or the client resource function 133. As an alternative, a Distributed Node (DN) and functionality, e.g. comprised in the cloud 190 as shown in Figure 5, may be used for performing or partly performing the methods herein.The above-described problem is addressed in a number of embodiments, some of which may be seen as alternatives, while some may be used in combination.

[0139] Figure 7 shows example embodiments of a method for handling training of a Machine Learning, ML, model in the communication network 100. The method may e.g., be performed by a training node 130, such as the SCP 130. As mentioned above, the training node 130, or SCP 130, may comprise, or be connected to, the client selection function 131, the client profile function 132 and the client resource function 133. Thus, the client selection function 131, the client profile function 132 and the client profile function 133 may all perform parts of the method. The method comprises the following actions, which actions may be taken in any suitable order. Actions that are optional are presented in dashed boxes in Figure 7.

[0140] Action 701

[0141] In some embodiments, responsive to training of the ML model being initiated, a set of dictionaries is initialized. This may comprise creating the set of dictionaries for the training of the ML model.

[0142] Action 702

[0143] A respective training score is calculated for one or more first clients from a set of clients. The one or more first clients have performed training of the ML model. The one or more first clients may e.g., comprise a subset of the clients in the set of clients. The subset comprises percentage of clients from the set of client. The percentage may e.g., be predefined for the training the ML model.

[0144] In some embodiments, the training score is calculated based on a number of training metrics associated with the respective one or more first clients. In other words, the metrics are input parameters for calculating the training score. Thus, the metrics will affect the possible selection of subsequent clients for training the ML model.

[0145] In some embodiments, the number of metrics comprises any one or more out of processor load, battery, memory, model drift, training time, upload speed, and energy consumption.

[0146] The processor load may e.g., comprise the processor load of the client while training the ML model.

[0147] The battery may e.g., comprise a state of charge of a battery in the client.

[0148] The memory may e.g., comprise an amount of available memory in the client.

[0149] The model drift may e.g., comprise a parameter indicating the model drift of the ML model as trained by the client.The training time may e.g., comprise the time consumed for training the ML model. The upload speed may e.g., comprise a speed of uploading the ML model for training by the client.

[0150] The energy consumption may e.g., comprise the energy consumed while training the ML model.

[0151] In some embodiments, each metric is associated with a weight factor. The training score is calculated by weighting each metric with its associated weight factor. This may e.g., mean that each metric is multiplied by its associated weight factor. The training score may then be calculated by adding the one or more weighted metrics. Thus, the score calculation may be adapted by changing the weight factors.

[0152] In some embodiments, calculating the score comprises applying a penalty factor. The penalty factor may e.g., as explained further below, be based on how many times in row a client has been selected, or not selected, for training the ML model.

[0153] Action 703

[0154] A set of dictionaries is updated based on the respective training scores.

[0155] In some embodiments, the set of dictionaries comprises a first dictionary and a second dictionary. The first dictionary is related to the training score of the clients in the set of clients. The second dictionary is related to a selection history of the clients in the set of clients.

[0156] In some embodiments, the penalty factor is increased responsive to a client being selected to train the ML model. In other words, being selected several times in a row is penalized.

[0157] In some embodiments, the penalty factor is set to zero responsive to a client not being selected for training the ML model for a number of iterations larger than threshold. In other words, after not being selected for training the ML model for a number of iterations, any penalty is removed.

[0158] In some embodiments, the selection history comprises number of times a client has been selected and a number of iterations of training the ML model since the client was selected. The second dictionary further comprises the penalty factor.

[0159] In some embodiments, updating the first dictionary comprises replacing a previous score of a client with the calculated score. If a previous score of a client is larger than a score threshold, the penalty factor may be applied to the calculated score.

[0160] In some embodiments, updating the second dictionary comprises updating the penalty factor and the selection history.

[0161] Action 704One or more subsequent clients is selected from the set of clients based on the set of dictionaries. The one or more subsequent clients may e.g., be selected based on the scores in the first dictionary. In some examples, the clients with the highest score are selected. The one or more subsequent clients may e.g., comprise a subset of the clients in the set of clients. The subset comprises percentage of clients from the set of client. The percentage may e.g., be predefined for the training the ML model.

[0162] Action 705

[0163] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more first clients, not achieving convergence, further training of the ML model by the one or more subsequent clients is initiated. In other words, if the ML model does not achieve convergence after a training iteration, further training of the ML model is performed. Initiating the further training may comprise the aggregated ML model is sent to the selected one or more subsequent clients for training.

[0164] Action 706

[0165] In some embodiments, a respective subsequent training score is calculated for the one or more subsequent clients from the set of clients. The one or more subsequent clients have performed training of the ML model. The one or more subsequent clients may e.g., comprise a subset of the clients in the set of clients. The subset comprises percentage of clients from the set of client. The percentage may e.g., be predefined for the training the ML model.

[0166] In some embodiments, the training score is calculated based on a number of training metrics associated with the respective one or more first clients. In other words, the metrics are input parameters for calculating the training score. Thus, the metrics will affect the possible selection of subsequent clients for training the ML model.

[0167] In some embodiments, the number of metrics comprises any one or more out of processor load, battery, memory, model drift, training time, upload speed, and energy consumption.

[0168] The processor load may e.g., comprise the processor load of the client while training the ML model.

[0169] The battery may e.g., comprise a state of charge of a battery in the client.

[0170] The memory may e.g., comprise an amount of available memory in the client.

[0171] The model drift may e.g., comprise a parameter indicating the model drift of the ML model as trained by the client.

[0172] The training time may e.g., comprise the time consumed for training the ML model.The upload speed may e.g., comprise a speed of uploading the ML model for training by the client.

[0173] The energy consumption may e.g., comprise the energy consumed while training the ML model.

[0174] In some embodiments, each metric is associated with a weight factor. The training score is calculated by weighting each metric with its associated weight factor. This may e.g., mean that each metric is multiplied by its associated weight factor. The training score may then be calculated by adding the one or more weighted metrics. Thus, the score calculation may be adapted by changing the weight factors.

[0175] In some embodiments, calculating the score comprises applying a penalty factor. The penalty factor may e.g., as explained further below, be based on how many times in row a client has been selected, or not selected, for training the ML model.

[0176] Action 707

[0177] In some embodiments, the set of dictionaries is updated is based on the respective subsequent training scores.

[0178] In some embodiments, the set of dictionaries comprises a first dictionary and a second dictionary. The first dictionary is related to the training score of the clients in the set of clients. The second dictionary is related to a selection history of the clients in the set of clients.

[0179] In some embodiments, the penalty factor is increased responsive to a client being selected to train the ML model. In other words, being selected several times in a row is penalized.

[0180] In some embodiments, the penalty factor is set to zero responsive to a client not being selected for training the ML model for a number of iterations larger than threshold. In other words, after not being selected for training the ML model for a number of iterations, any penalty is removed.

[0181] In some embodiments, the selection history comprises number of times a client has been selected and a number of iterations of training the ML model since the client was selected. The second dictionary further comprises the penalty factor.

[0182] In some embodiments, updating the first dictionary comprises replacing a previous score of a client with the calculated score. If a previous score of a client is larger than a score threshold, the penalty factor may be applied to the calculated score.

[0183] In some embodiments, updating the second dictionary comprises updating the penalty factor and the selection history.

[0184] Action 708In some embodiments, one or more further subsequent clients is selected from the set of clients based on the set of dictionaries. The one or more subsequent clients may e.g., be selected based on the scores in the first dictionary. In some examples, the clients with the highest score are selected. The one or more subsequent clients may e.g., comprise a subset of the clients in the set of clients. The subset comprises percentage of clients from the set of client. The percentage may e.g., be predefined for the training the ML model.

[0185] Action 709

[0186] In some embodiments, responsive to a subsequent aggregated ML model, which subsequent aggregated model is aggregated from local models trained by the one or more subsequent clients, not achieving convergence, further training of the ML model by the one or more further subsequent clients is initiated. In other words, if the ML model does not achieve convergence after a training iteration, further training of the ML model is performed. Initiating the further training may comprise the aggregated ML model is sent to the selected one or more subsequent clients for training.

[0187] Action 710

[0188] In some embodiments, responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stop training the ML model. In other words, if the ML model does achieve convergence after a training iteration, no further training of the ML model is performed, i.e., the ML model is considered to be trained.

[0189] Embodiments herein such as the embodiments mentioned above will now be further described and exemplified. The text below is applicable to embodiments herein and may be combined with any suitable embodiment described above.

[0190] The main key contributions and concept of embodiments herein rely on new components and functionalities introduced into the existing 5G core network architecture. The 5G core network already supports HFL in NWDAF for ML model training. The client / server architecture concept is now replaced by the concept of producer / consumer in 5GC to allow interaction between network functions. This architecture is exploited to integrate our new components and functionalities.

[0191] Figure 8 shows a global architecture according to embodiments herein.. The main new components are embedded inside the SCP which allows flexibility by leveraging theconcept of consumer / producer. We consider % NWDAFs clients in the right side that participate to a training instance, including the NWDAF server on the left. As described in communication model D, we now allow the SCP to collect and aggregate dynamic information’s of the NWDAFs clients.

[0192] Each NWDAF involving in the training instance is updated with the carbon footprint tools, used to extract the energy consumption of the FL operations in each training round.

[0193] The new components are described below.

[0194] DRP4FL

[0195] The Dynamic Resource Profile for FL (DRP4FL) is integrated into the SCP. This Dynamic profile is an extension of the NWDAFs client’s profile available inside the NRF. The client / server metric is like performance measurements already available in 5GC, also known as common performance measurements for NFs.

[0196] The service and capacity of each NWDAFs client is updated and tracked over training iteration and includes:

[0197] The dictionary of the “resource score” of each client NWDAF at the end of each training round, obtained from the RI4FML.

[0198] The dictionary containing the Number of times each client NWDAF has been selected over the past training rounds.

[0199] These dictionaries must be initialized at the beginning of the training and updated over training iteration. These two dictionaries are used for inference by the AICSFL at the beginning of each training round to select the appropriated client NWDAFs.

[0200] RI4FL

[0201] The Resource Information for FL (RI4FL) is a dynamic database used to collect specific health received at a given period. Each client NWDAFs update the RI4FL with the following information as metrics, at a given predefined frequency:

[0202] The CPU / GPU load: Lower metric is attributed to higher CPU / GPU load.

[0203] Battery (Could be mapped by the platform, Hardware / Cloud)

[0204] Memory available: This metric is the memory available in the system associated with this client NWDAF in mega octets (Mo). Higher metric is attributed to greater available memory.

[0205] Model drift: The model drift is extracted from the loss calculated from the two last epochs. The model drift is the evolution percentage of the loss between the last two epochs. Higher metric will be attributed to greater evolution of the loss.

[0206] Training time: The training time metric is derived from the cumulative time recorded during both client-side model training and model transmission to theserver, extracted from the last epoch. This metric is then normalized by dividing it by the total duration of the last 10 epochs on the server. A higher score will be attributed to a lower training time percentage.

[0207] Upload speed: The rate at which the model, measured in megabits per second (Mbps) is uploaded over a network based on its size and the time taken for transmission. A higher metric is attributed to the faster upload speed.

[0208] Energy Consumption: High metric is attributed to lower energy consumption. This metric is derived from cumulative emission recorded during both client-side model training and model transmission to the server, extracted from the last training epoch. This metric is also normalized by dividing it by the total emission of the last 10 epoch on the server.

[0209] When these metrics are received, the RI4FL normalizes, consolidates and stores. Specific weighting parameters are applied to each metrics such as to balance the selection. The weighting factors could be determined as application specific. We suggest for each metric to consider a range instead of fixed value. The RI4FL maintains data base that will be consulted and consumed by the DRP4FL to update the two dictionaries.

[0210] The metrics could be considered between minmetricand maxmetric, each, with step values to characterize the performance. Where best performance (such as low CPU load, High available Battery percentage, Low memory consumption relative to installed memory, high contribution to loss, less training time, high upload speed, low energy consumption) is granted with maxmetricand worst case (high CPU load, low available battery, high memory consumption, less contribution to loss, high training time, low upload speed and high energy consumption) is taken as minmetric.

[0211] As an example for CPU load, we may consider the following assignation for high CPU load, for each clientNWDAF®:

[0212] if CPUload^clientNWDAF^ > 90% Then CPUmetriccUentNWDAF(l)=

[0213] m^-nmetric (4)

[0214] Also,

[0215] if CPUload^clientNWDAFt ) < 50% Then CPUmetricclientNWDAF(l)= maxmetric(5)

[0216] Evolution between high CPU load and minimum CPU load could be mapped by a linear function or hyperbole.

[0217] AICS4FLAt the beginning of each training iteration, the Al based Clients Selection for FL (AICS4FL) model, trained offline is used to perform the second layer selection. In this IvD, the AICS4FL model is designed to simulate the ‘draw with penalized replacement’ strategy. We want to avoid the selection of same worker in consecutive training iterations. The objective is to obtain a more generalizable model. The AICS4FL ingests the two dictionaries maintained by the DRP4FL at the beginning of each training round and outputs the subset of qualified NWDAFs clients for this training. Only NWDAF clients which have been selected receive the updated global model to update local model on their local data set. The remaining NWDAFs clients switch their mode to Idle, enabling effective energy saving: no computation and no communication.

[0218] Interaction between the new components and functionalities

[0219] Figure 9 shows how new functional blocks work together when deployed. The RI4FL-DB receives metrics from client NWDAFs and estimated the scores. The scores are saved in the DB and sent to the DRP4FL. The DRP4FL uses the scores to update the two dictionaries: Dictionary of score and dictionary of selected clients. The two dictionaries are dynamically updated and sent to the AICS4FL at the beginning of each training round for inference. After the inference, the selected client NWDAFs are reported to the server NWDAF, for next round kickoff, and to the DRP4FL, to update the dictionary of selection.

[0220] Offline Training of the AICS4FL

[0221] The online training of the AICS4FL AI / ML model is set to perform the so called ‘draw with penalized replacement3. To achieve such goal, data needs to be extracted from the steps below: the features and the target.

[0222] Data Preparation: feature’s space and target.

[0223] The data collection is achieved by building the two dictionaries mentioned above A training scenario, one central server and % workers or clients, is executed, and the following processing actions are performed. Code carbon tools may be used to extract the energy consumption of each participant. An MQTT broker is implemented to collect the metrics shared by the clients

[0224] For each client, upon reception of metrics, for each client:Esch metric, seven in total (m mCpUioac, ^-battery, ^memory.mmodeldrift’mtrainingtime >muploadspeed and menergyConsUption) is normalized between minmetric and maxmetric, ■

[0225] Each metric is then weighted with given factor amjWhere:

[0226] Smi = lami 1 (6)

[0227] Then, for each participant, the score is derived from the metrics such as:

[0228] S

[0229]

[0230] corecnenf^ — Otmt ' mCpUioaa + tm- i ' ^battery + Ctmy 'mmemory T 'mmodeldrift Tam5

[0231] 'mtrainingtime Tam6 'muploadspeed Tam7 'mmemory consumption

[0232] (7)

[0233] In each training round, the scores are stored in the first dictionary:

[0234] DictScore = {(clientkScoreclient k))} (8)

[0235] The second dictionary is created to track the selection evolution:

[0236] DictSelection = { clientk. numb er Selection )} (9)

[0237] The Feature space is made of the two dictionaries: DictScore and DictSelection. The second dictionary is updated as the training evolves with two more values: the penalty and the selection offset, also labelled as: “round since selected”.

[0238] DictSelection = {(clientknumberSelection, roundsinceSelected, penalty )} (10) The penalty for each client is reset to zero if the client was not selected during the last certain number of training rounds: Threshold_Since_Selected. The constant ConstPenalty used to penalize the score is considered between 0 and 1.

[0239] The Target: The target of the dataset is the selection distribution of the clients during the training.

[0240] The following algorithm is proposed to select the reference clients / workers in each training round.

[0241] Below is an algorithm, Algorithm 1, for building the target of the data set needed to train the AICS4FL.

[0242] Algorithm 1:

[0243] For every Training Round

[0244] For every client having score above the predefined threshold (ThresholdscoreY Apply the penalty (Penalty estimated based on client selection dynamic)

[0245]

[0246] Scoreclientw= Scoreclientw■ ConstPenaltypenalty

[0247] Select Clients with highest score according to the percentage of workers allowed in each training round.

[0248] For every client:

[0249] If a client is selected:

[0250] penalty+=1

[0251] update round since selected = 0

[0252] else:

[0253] round since selected +=1

[0254] if round since selected > Threshold_Since_Selected

[0255] penalty = 0

[0256] Return: clients selected

[0257]

[0258] The AICS4FL could be a Neural Network with few layers, relatively flexible to handle the two dictionaries. The output of the model is used to map the selection as performed by our proposed algorithm. This AI / ML model ranks the client NWDAFs, and the best subset is selected for the next training round accordingly.

[0259] Deployment view of the proposed solution

[0260] Figure 10 shows a deployment view on the Federated Learning framework according to examples of embodiments herein. Examples of embodiments herein is designed to support simultaneous training of different AI / ML models for different use cases. The SCP is updated to support the FL and will allow the flexibility to handle multiples training instances. Based on the saving induced by the solution, multiple training instances could be launched at the same time. Such capability is suitable because fast model availability based on the amount of energy that is saved. In Figure 8, up to three training are launched in parallel, including even more combinations.

[0261] Traffic scenarios description

[0262] Figures 11a-b show dynamic paring between FL server and FL clients for energy efficiency over training iterations. The traffic scenario is presented with transition from training iteration k, Figure 12a, to training iteration k+1, Figure 12b. Based on the requirement and configuration, the AIS4FL will dynamically select or update the clients NWDAFs in each training iteration. Such flexibility helps the training process to avoidstragglers. NWDAFs clients with meaningful contribution will participate most of the time. The system is constructed such as to let all the clients to contribute to the training. The final global model is expected to capture the patterns from the entire participants, leading to model generality.

[0263] Sequential flows for Model Training under FL framework

[0264] Figure 12 shows a sequence diagram according to embodiments herein. Assuming the NWDAFs are already provided with data collected from network exposure function, the following operational steps are performed to complete the sustainable FL based on the target energy reduction, e.g., “x%” of client’s the server should participate in each training round.

[0265] Operation 0: A network function consumer (NF Consumer) requests FL model training to NRF.

[0266] Operation 1: NWDAF discovery: The NRF will work to find all reliable and available NWDAFs. The NRF will also check which logical function each NWDAF has to set the role.

[0267] Operation 2: Server NWDAF selection of client NWDAFs participants. This is the first layer selection or “Global Selection”. This sequence is used to identify the number of clients NWDAFs reliable or equipped with MTALF (in Yellow).

[0268] Operation 3: The SCP setup and initialize the dictionary with idle value.

[0269] Operation 4: The server NWDAF dispatches the Global Model to “x%” of workers. Recursive Steps of the FML until the model convergence.

[0270] o Operation 4.1: Clients NWDAF who have received the Global Model perform their local training on their local dataset.

[0271] o Operation 4.2: The DRP4FL updates the dictionary of selection from previous training round

[0272] o Operation 4.3: Each client NWDAF send it metrics info to SCP-FL

[0273] o Operation 4.4: The RI4FL-DB receives the metrics from clients NWDAFs.

[0274] Then it applies weighting factors as defined. RI4FL-DB then derived the score of each client NWDAF, then saves in the DB. Finally, RI4FL sends the score to the DRP4FL.

[0275] o Operation 4.5: The DRP4FL uses the received score to update the second dictionary of clients NWDAFs score and dictionary of selection. The DRP4FL then sends the two dictionaries to the Al Model for inference.o Operation 4.6: The AICS4FL ingests the two dictionaries and derives the classification.

[0276] o Operation 4.7: According to the selection percentage specifies during the configuration, the list of clients NWDAFs is sent to the server NWDAF.

[0277] o Operation 4.8: Clients NWDAFs (“x%”) send their local model to the server NWDAF.

[0278] o Operation 4.9: Server NWDAFs aggregates and forms the Global model. o Operation 4.10: The model is tested and if the convergence is achieved, exit the training, send model to NF consumer. If the model has not converged, then go to Operation 4, until convergence.

[0279] To perform the method actions above, the training node 130 is configured to handle training of an ML model in the wireless communication network 100. The training node 130 may comprise an arrangement depicted in Figure 13.

[0280] The training node 130 may comprise an input and output interface 10 configured to communicate with each other. The input and output interface 10 may comprise a receiver, e.g. wired and / or wireless, (not shown) and a transmitter, e.g. wired and / or wireless, (not shown).

[0281] The embodiments herein may be implemented through a respective processor or one or more processors, such as at least one processor 11 of a processing circuitry in the training node 130 depicted in Figure 13, together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the training node 130. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the training node 130.

[0282] The training node 130 and / or processor 11 is configured to handle training of an ML model in the communication network 100.

[0283] The training node 130 and / or processor 11 is configured to calculate a respective training score for one or more first clients from a set of clients, the one or more first clients having performed training of the ML model.

[0284] The training node 130 and / or processor 11 is configured to update a set of dictionaries based on the respective training scores.The training node 130 and / or processor 11 is configured to select one or more subsequent clients from the set of clients based on the set of dictionaries.

[0285] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more clients, not achieving convergence, the training node 130 and / or processor 11 is configured to initiate further training of the ML model by the one or more subsequent clients

[0286] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, the training node 130 and / or processor 11 is configured to stop training the ML model.

[0287] In some embodiments, responsive to the training node 130 and / or processor 11 being configured to initiate further training of the ML model, the training node 130 and / or processor 11 is configured to:

[0288] calculate a respective subsequent training score for one or more subsequent clients from the set of clients, the one or more subsequent clients having performed training of the ML model,

[0289] update the set of dictionaries based on the respective subsequent training scores, select one or more further subsequent clients from the set of clients based on the set of dictionaries, and

[0290] responsive to a subsequent aggregated ML model, which subsequent aggregated model is aggregated from local models trained by the one or more subsequent clients, not achieving convergence, initiate further training of the ML model by the one or more further subsequent clients, or

[0291] responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stop training the ML model.

[0292] In some embodiments, the training score is calculated based on a number of training metrics associated with the respective one or more first clients.

[0293] In some embodiments, the number of metrics comprises any one or more out of processor load, battery, memory, model drift, training time, upload speed, and energy consumption.

[0294] In some embodiments, each metric is associated with a weight factor, and wherein the training score is calculated by weighting each metric with its associated weight factor.

[0295] In some embodiments, calculating the score comprises applying a penalty factor. In some embodiments, the penalty factor is increased responsive to a client being selected to train the ML model, and wherein the penalty factor is set to zero responsive toa client not being selected for training the ML model for a number of iterations larger than threshold.

[0296] In some embodiments, the set of dictionaries comprises a first dictionary and a second dictionary, and wherein the first dictionary is related to the training score of the clients in the set of clients, and wherein the second dictionary is related to a selection history of the clients in the set of clients.

[0297] In some embodiments, the selection history comprises number of times a client has been selected and a number of iterations of training the ML model since the client was selected, and wherein the second dictionary further comprises the penalty factor.

[0298] In some embodiments, responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, the training node 130 and / or processor 11 is further configured to stop training the ML model.

[0299] In some embodiments, the training node 130 is further configured to comprise a client selection function 131, a client profile function 132 and a client resource function 133. The functions 131, 132, 133 are configured to perform the steps according to any of the embodiments herein.

[0300] The training node 130 may further comprise respective a memory 12 comprising one or more memory units. The memory 12 comprises instructions executable by the processor 11 in the training node 130.

[0301] The memory 12 is arranged to be used to store instructions, data, configurations, models, dictionaries, scores, metrics, requests, responses, messages, identifiers, indications, parameters, applications to perform the methods herein when being executed in the training node 130.

[0302] In some embodiments, a computer program 13 comprises instructions, which when executed by the at least one processor 11, cause the at least one processor 11 of the training node 130 to perform the actions above.

[0303] In some embodiments, a respective carrier 14 comprises the respective computer program 13, wherein the carrier 14 is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.

[0304] Thus, embodiments herein may disclose the training node 130 e.g., configured to handle training of an ML model in the wireless communication network 100. The training node 130 comprises the processor 11 and the memory 12, said memory 12 comprisinginstructions executable by said processor 11 whereby said training node 130 is operative to perform any of the methods herein.

[0305] As will be readily understood by those familiar with communications design, that functions means or modules may be implemented using digital logic and / or one or more microcontrollers, microprocessors, or other digital hardware. In some embodiments, several or all of the various functions may be implemented together, such as in a single application-specific integrated circuit (ASIC), or in two or more separate devices with appropriate hardware and / or software interfaces between them. Several of the functions may be implemented on a processor shared with other functional components of a base station, for example.

[0306] Alternatively, several of the functional elements of the processing means discussed may be provided through the use of dedicated hardware, while others are provided with hardware for executing software, in association with the appropriate software or firmware. Thus, the term “processor” or “controller” as used herein does not exclusively refer to hardware capable of executing software and may implicitly include, without limitation, digital signal processor (DSP) hardware, read-only memory (ROM) for storing software, random-access memory for storing software and / or program or application data, and nonvolatile memory. Other hardware, conventional and / or custom, may also be included. Designers of communications receivers will appreciate the cost, performance, and maintenance trade-offs inherent in these design choices.

[0307] Any appropriate steps, methods, features, functions, or benefits disclosed herein may be performed through one or more functional units or modules of one or more virtual apparatuses. Each virtual apparatus may comprise a number of these functional units. These functional units may be implemented via processing circuitry, which may include one or more microprocessor or microcontrollers, as well as other digital hardware, which may include digital signal processors (DSPs), special-purpose digital logic, and the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as read-only memory (ROM), random-access memory (RAM), cache memory, flash memory devices, optical storage devices, etc. Program code stored in memory includes program instructions for executing one or more telecommunications and / or data communications protocols as well as instructions for carrying out one or more of the techniques described herein. In some implementations, the processing circuitry may be used to cause the respective functional unit to perform corresponding functions according one or more embodiments of the present disclosure.To perform the method actions above, the client selection function 131 is configured to handle training of an ML model in the wireless communication network 100. The client selection function 131 may comprise an arrangement depicted in Figure 14.

[0308] The client selection function 131 may comprise an input and output interface 20 configured to communicate with each other. The input and output interface 20 may comprise a receiver, e.g. wired and / or wireless, (not shown) and a transmitter, e.g. wired and / or wireless, (not shown).

[0309] The embodiments herein may be implemented through a respective processor or one or more processors, such as at least one processor 21 of a processing circuitry in the client selection function 131 depicted in Figure 14, together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the client selection function 131. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the client selection function 131.

[0310] The client selection function 131 and / or processor 21 is configured to handle training of an ML model in the communication network 100.

[0311] The client selection function 131 and / or processor 21 is configured to calculate a respective training score for one or more first clients from a set of clients, the one or more first clients having performed training of the ML model.

[0312] The client selection function 131 and / or processor 21 is configured to update a set of dictionaries based on the respective training scores.

[0313] The client selection function 131 and / or processor 21 is configured to select one or more subsequent clients from the set of clients based on the set of dictionaries.

[0314] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more clients, not achieving convergence, the client selection function 131 and / or processor 21 is configured to initiate further training of the ML model by the one or more subsequent clients

[0315] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, the client selection function 131 and / or processor 21 is configured to stop training the ML model.In some embodiments, responsive to the client selection function 131 and / or processor 21 being configured to initiate further training of the ML model, the client selection function 131 and / or processor 21 is configured to:

[0316] calculate a respective subsequent training score for one or more subsequent clients from the set of clients, the one or more subsequent clients having performed training of the ML model,

[0317] update the set of dictionaries based on the respective subsequent training scores, select one or more further subsequent clients from the set of clients based on the set of dictionaries, and

[0318] responsive to a subsequent aggregated ML model, which subsequent aggregated model is aggregated from local models trained by the one or more subsequent clients, not achieving convergence, initiate further training of the ML model by the one or more further subsequent clients, or

[0319] responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stop training the ML model.

[0320] In some embodiments, the training score is calculated based on a number of training metrics associated with the respective one or more first clients.

[0321] In some embodiments, the number of metrics comprises any one or more out of processor load, battery, memory, model drift, training time, upload speed, and energy consumption.

[0322] In some embodiments, each metric is associated with a weight factor, and wherein the training score is calculated by weighting each metric with its associated weight factor.

[0323] In some embodiments, calculating the score comprises applying a penalty factor. In some embodiments, the penalty factor is increased responsive to a client being selected to train the ML model, and wherein the penalty factor is set to zero responsive to a client not being selected for training the ML model for a number of iterations larger than threshold.

[0324] In some embodiments, the set of dictionaries comprises a first dictionary and a second dictionary, and wherein the first dictionary is related to the training score of the clients in the set of clients, and wherein the second dictionary is related to a selection history of the clients in the set of clients.

[0325] In some embodiments, the selection history comprises number of times a client has been selected and a number of iterations of training the ML model since the client was selected, and wherein the second dictionary further comprises the penalty factor.In some embodiments, responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, the client selection function 131 and / or processor 21 is further configured to stop training the ML model.

[0326] In some embodiments, the client selection function 131 is further configured to comprise a client selection function 131, a client profile function 132 and a client resource function 133. The functions 131, 132, 133 are configured to perform the steps according to any of the embodiments herein.

[0327] The client selection function 131 may further comprise respective a memory 22 comprising one or more memory units. The memory 22 comprises instructions executable by the processor 21 in the client selection function 131.

[0328] The memory 22 is arranged to be used to store instructions, data, configurations, models, dictionaries, scores, metrics, requests, responses, messages, identifiers, indications, parameters, applications to perform the methods herein when being executed in the client selection function 131.

[0329] In some embodiments, a computer program 23 comprises instructions, which when executed by the at least one processor 21, cause the at least one processor 21 of the client selection function 131 to perform the actions above.

[0330] In some embodiments, a respective carrier 24 comprises the respective computer program 23, wherein the carrier 24 is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.

[0331] Thus, embodiments herein may disclose the client selection function 131 e.g., configured to handle training of an ML model in the wireless communication network 100. The client selection function 131 comprises the processor 21 and the memory 22, said memory 22 comprising instructions executable by said processor 21 whereby said client selection function 131 is operative to perform any of the methods herein.

[0332] As will be readily understood by those familiar with communications design, that functions means or modules may be implemented using digital logic and / or one or more microcontrollers, microprocessors, or other digital hardware. In some embodiments, several or all of the various functions may be implemented together, such as in a single application-specific integrated circuit (ASIC), or in two or more separate devices with appropriate hardware and / or software interfaces between them. Several of the functions may be implemented on a processor shared with other functional components of a base station, for example.Alternatively, several of the functional elements of the processing means discussed may be provided through the use of dedicated hardware, while others are provided with hardware for executing software, in association with the appropriate software or firmware. Thus, the term “processor” or “controller” as used herein does not exclusively refer to hardware capable of executing software and may implicitly include, without limitation, digital signal processor (DSP) hardware, read-only memory (ROM) for storing software, random-access memory for storing software and / or program or application data, and nonvolatile memory. Other hardware, conventional and / or custom, may also be included. Designers of communications receivers will appreciate the cost, performance, and maintenance trade-offs inherent in these design choices.

[0333] Any appropriate steps, methods, features, functions, or benefits disclosed herein may be performed through one or more functional units or modules of one or more virtual apparatuses. Each virtual apparatus may comprise a number of these functional units. These functional units may be implemented via processing circuitry, which may include one or more microprocessor or microcontrollers, as well as other digital hardware, which may include digital signal processors (DSPs), special-purpose digital logic, and the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as read-only memory (ROM), random-access memory (RAM), cache memory, flash memory devices, optical storage devices, etc. Program code stored in memory includes program instructions for executing one or more telecommunications and / or data communications protocols as well as instructions for carrying out one or more of the techniques described herein. In some implementations, the processing circuitry may be used to cause the respective functional unit to perform corresponding functions according one or more embodiments of the present disclosure.

[0334] To perform the method actions above, the client profile function 132 is configured to handle training of an ML model in the wireless communication network 100. The client profile function 132 may comprise an arrangement depicted in Figure 15.

[0335] The client profile function 132 may comprise an input and output interface 30 configured to communicate with each other. The input and output interface 30 may comprise a receiver, e.g. wired and / or wireless, (not shown) and a transmitter, e.g. wired and / or wireless, (not shown).

[0336] The embodiments herein may be implemented through a respective processor or one or more processors, such as at least one processor 31 of a processing circuitry inthe client profile function 132 depicted in Figure 15, together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the client profile function 132. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the client profile function 132.

[0337] The client profile function 132 and / or processor 31 is configured to handle training of an ML model in the communication network 100.

[0338] The client profile function 132 and / or processor 31 is configured to calculate a respective training score for one or more first clients from a set of clients, the one or more first clients having performed training of the ML model.

[0339] The client profile function 132 and / or processor 31 is configured to update a set of dictionaries based on the respective training scores.

[0340] The client profile function 132 and / or processor 31 is configured to select one or more subsequent clients from the set of clients based on the set of dictionaries.

[0341] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more clients, not achieving convergence, the client profile function 132 and / or processor 31 is configured to initiate further training of the ML model by the one or more subsequent clients

[0342] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, the client profile function 132 and / or processor 31 is configured to stop training the ML model.

[0343] In some embodiments, responsive to the client profile function 132 and / or processor 31 being configured to initiate further training of the ML model, the client profile function 132 and / or processor 31 is configured to:

[0344] calculate a respective subsequent training score for one or more subsequent clients from the set of clients, the one or more subsequent clients having performed training of the ML model,

[0345] update the set of dictionaries based on the respective subsequent training scores, select one or more further subsequent clients from the set of clients based on the set of dictionaries, andresponsive to a subsequent aggregated ML model, which subsequent aggregated model is aggregated from local models trained by the one or more subsequent clients, not achieving convergence, initiate further training of the ML model by the one or more further subsequent clients, or

[0346] responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stop training the ML model.

[0347] In some embodiments, the training score is calculated based on a number of training metrics associated with the respective one or more first clients.

[0348] In some embodiments, the number of metrics comprises any one or more out of processor load, battery, memory, model drift, training time, upload speed, and energy consumption.

[0349] In some embodiments, each metric is associated with a weight factor, and wherein the training score is calculated by weighting each metric with its associated weight factor.

[0350] In some embodiments, calculating the score comprises applying a penalty factor. In some embodiments, the penalty factor is increased responsive to a client being selected to train the ML model, and wherein the penalty factor is set to zero responsive to a client not being selected for training the ML model for a number of iterations larger than threshold.

[0351] In some embodiments, the set of dictionaries comprises a first dictionary and a second dictionary, and wherein the first dictionary is related to the training score of the clients in the set of clients, and wherein the second dictionary is related to a selection history of the clients in the set of clients.

[0352] In some embodiments, the selection history comprises number of times a client has been selected and a number of iterations of training the ML model since the client was selected, and wherein the second dictionary further comprises the penalty factor.

[0353] In some embodiments, responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, the client profile function 132 and / or processor 31 is further configured to stop training the ML model.

[0354] In some embodiments, the client profile function 132 is further configured to comprise a client profile function 132, a client profile function 132 and a client resource function 133. The functions 131, 132, 133 are configured to perform the steps according to any of the embodiments herein.The client profile function 132 may further comprise respective a memory 32 comprising one or more memory units. The memory 32 comprises instructions executable by the processor 31 in the client profile function 132.

[0355] The memory 32 is arranged to be used to store instructions, data, configurations, models, dictionaries, scores, metrics, requests, responses, messages, identifiers, indications, parameters, applications to perform the methods herein when being executed in the client profile function 132.

[0356] In some embodiments, a computer program 33 comprises instructions, which when executed by the at least one processor 31, cause the at least one processor 31 of the client profile function 132 to perform the actions above.

[0357] In some embodiments, a respective carrier 34 comprises the respective computer program 33, wherein the carrier 34 is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.

[0358] Thus, embodiments herein may disclose the client profile function 132 e.g., configured to handle training of an ML model in the wireless communication network 100. The client profile function 132 comprises the processor 31 and the memory 32, said memory 32 comprising instructions executable by said processor 31 whereby said client profile function 132 is operative to perform any of the methods herein.

[0359] As will be readily understood by those familiar with communications design, that functions means or modules may be implemented using digital logic and / or one or more microcontrollers, microprocessors, or other digital hardware. In some embodiments, several or all of the various functions may be implemented together, such as in a single application-specific integrated circuit (ASIC), or in two or more separate devices with appropriate hardware and / or software interfaces between them. Several of the functions may be implemented on a processor shared with other functional components of a base station, for example.

[0360] Alternatively, several of the functional elements of the processing means discussed may be provided through the use of dedicated hardware, while others are provided with hardware for executing software, in association with the appropriate software or firmware. Thus, the term “processor” or “controller” as used herein does not exclusively refer to hardware capable of executing software and may implicitly include, without limitation, digital signal processor (DSP) hardware, read-only memory (ROM) for storing software, random-access memory for storing software and / or program or application data, and nonvolatile memory. Other hardware, conventional and / or custom, may also be included.Designers of communications receivers will appreciate the cost, performance, and maintenance trade-offs inherent in these design choices.

[0361] Any appropriate steps, methods, features, functions, or benefits disclosed herein may be performed through one or more functional units or modules of one or more virtual apparatuses. Each virtual apparatus may comprise a number of these functional units. These functional units may be implemented via processing circuitry, which may include one or more microprocessor or microcontrollers, as well as other digital hardware, which may include digital signal processors (DSPs), special-purpose digital logic, and the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as read-only memory (ROM), random-access memory (RAM), cache memory, flash memory devices, optical storage devices, etc. Program code stored in memory includes program instructions for executing one or more telecommunications and / or data communications protocols as well as instructions for carrying out one or more of the techniques described herein. In some implementations, the processing circuitry may be used to cause the respective functional unit to perform corresponding functions according one or more embodiments of the present disclosure.

[0362] To perform the method actions above, the client resource function 133 is configured to handle training of an ML model in the wireless communication network 100. The client resource function 133 may comprise an arrangement depicted in Figure 16.

[0363] The client resource function 133 may comprise an input and output interface 40 configured to communicate with each other. The input and output interface 40 may comprise a receiver, e.g. wired and / or wireless, (not shown) and a transmitter, e.g. wired and / or wireless, (not shown).

[0364] The embodiments herein may be implemented through a respective processor or one or more processors, such as at least one processor 41 of a processing circuitry in the client resource function 133 depicted in Figure 16, together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the client resource function 133. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore beprovided as pure program code on a server and downloaded to the client resource function 133.

[0365] The client resource function 133 and / or processor 41 is configured to handle training of an ML model in the communication network 100.

[0366] The client resource function 133 and / or processor 41 is configured to calculate a respective training score for one or more first clients from a set of clients, the one or more first clients having performed training of the ML model.

[0367] The client resource function 133 and / or processor 41 is configured to update a set of dictionaries based on the respective training scores.

[0368] The client resource function 133 and / or processor 41 is configured to select one or more subsequent clients from the set of clients based on the set of dictionaries.

[0369] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more clients, not achieving convergence, the client resource function 133 and / or processor 41 is configured to initiate further training of the ML model by the one or more subsequent clients

[0370] Responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, the client resource function 133 and / or processor 41 is configured to stop training the ML model.

[0371] In some embodiments, responsive to the client resource function 133 and / or processor 41 being configured to initiate further training of the ML model, the client resource function 133 and / or processor 41 is configured to:

[0372] calculate a respective subsequent training score for one or more subsequent clients from the set of clients, the one or more subsequent clients having performed training of the ML model,

[0373] update the set of dictionaries based on the respective subsequent training scores, select one or more further subsequent clients from the set of clients based on the set of dictionaries, and

[0374] responsive to a subsequent aggregated ML model, which subsequent aggregated model is aggregated from local models trained by the one or more subsequent clients, not achieving convergence, initiate further training of the ML model by the one or more further subsequent clients, or

[0375] responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stop training the ML model.In some embodiments, the training score is calculated based on a number of training metrics associated with the respective one or more first clients.

[0376] In some embodiments, the number of metrics comprises any one or more out of processor load, battery, memory, model drift, training time, upload speed, and energy consumption.

[0377] In some embodiments, each metric is associated with a weight factor, and wherein the training score is calculated by weighting each metric with its associated weight factor.

[0378] In some embodiments, calculating the score comprises applying a penalty factor. In some embodiments, the penalty factor is increased responsive to a client being selected to train the ML model, and wherein the penalty factor is set to zero responsive to a client not being selected for training the ML model for a number of iterations larger than threshold.

[0379] In some embodiments, the set of dictionaries comprises a first dictionary and a second dictionary, and wherein the first dictionary is related to the training score of the clients in the set of clients, and wherein the second dictionary is related to a selection history of the clients in the set of clients.

[0380] In some embodiments, the selection history comprises number of times a client has been selected and a number of iterations of training the ML model since the client was selected, and wherein the second dictionary further comprises the penalty factor.

[0381] In some embodiments, responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, the client resource function 133 and / or processor 41 is further configured to stop training the ML model.

[0382] In some embodiments, the client resource function 133 is further configured to comprise a client resource function 133, a client resource function 133 and a client resource function 133. The functions 131, 132, 133 are configured to perform the steps according to any of the embodiments herein.

[0383] The client resource function 133 may further comprise respective a memory 42 comprising one or more memory units. The memory 42 comprises instructions executable by the processor 41 in the client resource function 133.

[0384] The memory 42 is arranged to be used to store instructions, data, configurations, models, dictionaries, scores, metrics, requests, responses, messages, identifiers, indications, parameters, applications to perform the methods herein when being executed in the client resource function 133.In some embodiments, a computer program 43 comprises instructions, which when executed by the at least one processor 41, cause the at least one processor 41 of the client resource function 133 to perform the actions above.

[0385] In some embodiments, a respective carrier 44 comprises the respective computer program 43, wherein the carrier 44 is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.

[0386] Thus, embodiments herein may disclose the client resource function 133 e.g., configured to handle training of an ML model in the wireless communication network 100. The client resource function 133 comprises the processor 41 and the memory 42, said memory 42 comprising instructions executable by said processor 41 whereby said client resource function 133 is operative to perform any of the methods herein.

[0387] As will be readily understood by those familiar with communications design, that functions means or modules may be implemented using digital logic and / or one or more microcontrollers, microprocessors, or other digital hardware. In some embodiments, several or all of the various functions may be implemented together, such as in a single application-specific integrated circuit (ASIC), or in two or more separate devices with appropriate hardware and / or software interfaces between them. Several of the functions may be implemented on a processor shared with other functional components of a base station, for example.

[0388] Alternatively, several of the functional elements of the processing means discussed may be provided through the use of dedicated hardware, while others are provided with hardware for executing software, in association with the appropriate software or firmware. Thus, the term “processor” or “controller” as used herein does not exclusively refer to hardware capable of executing software and may implicitly include, without limitation, digital signal processor (DSP) hardware, read-only memory (ROM) for storing software, random-access memory for storing software and / or program or application data, and nonvolatile memory. Other hardware, conventional and / or custom, may also be included. Designers of communications receivers will appreciate the cost, performance, and maintenance trade-offs inherent in these design choices.

[0389] Any appropriate steps, methods, features, functions, or benefits disclosed herein may be performed through one or more functional units or modules of one or more virtual apparatuses. Each virtual apparatus may comprise a number of these functional units. These functional units may be implemented via processing circuitry, which may include one or more microprocessor or microcontrollers, as well as other digital hardware, whichmay include digital signal processors (DSPs), special-purpose digital logic, and the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as read-only memory (ROM), random-access memory (RAM), cache memory, flash memory devices, optical storage devices, etc. Program code stored in memory includes program instructions for executing one or more telecommunications and / or data communications protocols as well as instructions for carrying out one or more of the techniques described herein. In some implementations, the processing circuitry may be used to cause the respective functional unit to perform corresponding functions according one or more embodiments of the present disclosure.

[0390] ADDITIONAL EXPLANATION

[0391] Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. Embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art.

[0392] Figure 17 shows an example of a communication system QQ100 in accordance with some embodiments.

[0393] In the example, the communication system QQ100 includes a telecommunication network QQ102 that includes an access network QQ104, such as a radio access network (RAN), and a core network QQ106, which includes one or more core network nodes QQ108. The access network QQ104 includes one or more access network nodes, such as network nodes QQ110a and QQ110b (one or more of which may be generally referred to as network nodes QQ110), or any other similar 3rdGeneration Partnership Project (3GPP) access nodes or non-3GPP access points. Moreover, as will be appreciated by those of skill in the art, a network node is not necessarily limited to an implementation in which a radio portion and a baseband portion are supplied and integrated by a single vendor. Thus, it will be understood that network nodes include disaggregated implementations or portions thereof. For example, in some embodiments, the telecommunication network QQ102 includes one or more Open-RAN (ORAN) network nodes. An ORAN network node is a node in the telecommunication network QQ102 that supports an ORAN specification (e.g., a specification published by the O-RAN Alliance, or any similar organization) and may operate alone or together with other nodes to implement one or more functionalities of any node in the telecommunication network QQ102, including one or more network nodes QQ110 and / or core network nodes QQ108.Examples of an ORAN network node include an open radio unit (O-RU), an open distributed unit (O-DU), an open central unit (O-CU), including an O-CU control plane (O-CU-CP) or an O-CU user plane (O-CU-UP), a RAN intelligent controller (near-real time or non-real time) hosting software or software plug-ins, such as a near-real time control application (e.g., xApp) or a non-real time control application (e.g., rApp), or any combination thereof (the adjective “open” designating support of an ORAN specification). The network node may support a specification by, for example, supporting an interface defined by the ORAN specification, such as an A1, F1, W1, E1, E2, X2, Xn interface, an open fronthaul user plane interface, or an open fronthaul management plane interface. Moreover, an ORAN access node may be a logical node in a physical node. Furthermore, an ORAN network node may be implemented in a virtualization environment (described further below) in which one or more network functions are virtualized. For example, the virtualization environment may include an O-Cloud computing platform orchestrated by a Service Management and Orchestration Framework via an 0-2 interface defined by the O-RAN Alliance or comparable technologies. The network nodes QQ110 facilitate direct or indirect connection of user equipment (UE), such as by connecting UEs QQ112a, QQ112b, QQ112c, and QQ112d (one or more of which may be generally referred to as UEs QQ112) to the core network QQ106 over one or more wireless connections.

[0394] Example wireless communications over a wireless connection include transmitting and / or receiving wireless signals using electromagnetic waves, radio waves, infrared waves, and / or other types of signals suitable for conveying information without the use of wires, cables, or other material conductors. Moreover, in different embodiments, the communication system QQ100 may include any number of wired or wireless networks, network nodes, UEs, and / or any other components or systems that may facilitate or participate in the communication of data and / or signals whether via wired or wireless connections. The communication system QQ100 may include and / or interface with any type of communication, telecommunication, data, cellular, radio network, and / or other similar type of system.

[0395] The UEs QQ112 may be any of a wide variety of communication devices, including wireless devices arranged, configured, and / or operable to communicate wirelessly with the network nodes QQ110 and other communication devices. Similarly, the network nodes QQ110 are arranged, capable, configured, and / or operable to communicate directly or indirectly with the UEs QQ112 and / or with other network nodes or equipment in the telecommunication network QQ102 to enable and / or provide network access, such aswireless network access, and / or to perform other functions, such as administration in the telecommunication network QQ102.

[0396] In the depicted example, the core network QQ106 connects the network nodes QQ110 to one or more host computing systems, such as host QQ116. These connections may be direct or indirect via one or more intermediary networks or devices. In other examples, network nodes may be directly coupled to hosts. The core network QQ106 includes one more core network nodes (e.g., core network node QQ108) that are structured with hardware and software components. Features of these components may be substantially similar to those described with respect to the UEs, network nodes, and / or hosts, such that the descriptions thereof are generally applicable to the corresponding components of the core network node QQ108. Example core network nodes include functions of one or more of a Mobile Switching Center (MSC), Mobility Management Entity (MME), Home Subscriber Server (HSS), Access and Mobility Management Function (AMF), Session Management Function (SMF), Authentication Server Function (ALISF), Subscription Identifier De-concealing function (SIDF), Unified Data Management (UDM), Security Edge Protection Proxy (SEPP), Network Exposure Function (NEF), and / or a User Plane Function (UPF).

[0397] The host QQ116 may be under the ownership or control of a service provider other than an operator or provider of the access network QQ104 and / or the telecommunication network QQ102. The host QQ116 may host a variety of applications to provide one or more service. Examples of such applications include live and pre-recorded audio / video content, data collection services such as retrieving and compiling data on various ambient conditions detected by a plurality of UEs, analytics functionality, social media, functions for controlling or otherwise interacting with remote devices, functions for an alarm and surveillance center, or any other such function performed by a server.

[0398] As a whole, the communication system QQ100 of Figure 17 enables connectivity between the UEs, network nodes, and hosts. In that sense, the communication system may be configured to operate according to predefined rules or procedures, such as specific standards that include, but are not limited to: Global System for Mobile Communications (GSM); Universal Mobile Telecommunications System (UMTS); Long Term Evolution (LTE), and / or other suitable 2G, 3G, 4G, 5G standards, or any applicable future generation standard (e.g., 6G); wireless local area network (WLAN) standards, such as the Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (WiFi); and / or any other appropriate wireless communication standard, such as the Worldwide Interoperability for Microwave Access (WiMax), Bluetooth, Z-Wave, Near FieldCommunication (NFC) ZigBee, LiFi, and / or any low-power wide-area network (LPWAN) standards such as LoRa and Sigfox.

[0399] In some examples, the telecommunication network QQ102 is a cellular network that implements 3GPP standardized features. Accordingly, the telecommunications network QQ102 may support network slicing to provide different logical networks to different devices that are connected to the telecommunication network QQ102. For example, the telecommunications network QQ102 may provide Ultra Reliable Low Latency Communication (URLLC) services to some UEs, while providing Enhanced Mobile Broadband (eMBB) services to other UEs, and / or Massive Machine Type Communication (mMTC) / Massive loT services to yet further UEs.

[0400] In some examples, the UEs QQ112 are configured to transmit and / or receive information without direct human interaction. For instance, a UE may be designed to transmit information to the access network QQ104 on a predetermined schedule, when triggered by an internal or external event, or in response to requests from the access network QQ104. Additionally, a UE may be configured for operating in single- or multi-RAT or multi-standard mode. For example, a UE may operate with any one or combination of Wi-Fi, NR (New Radio) and LTE, i.e. being configured for multi-radio dual connectivity (MR-DC), such as E-UTRAN (Evolved-UMTS Terrestrial Radio Access Network) New Radio - Dual Connectivity (EN-DC).

[0401] In the example, the hub QQ114 communicates with the access network QQ104 to facilitate indirect communication between one or more UEs (e.g., UE QQ112c and / or QQ112d) and network nodes (e.g., network node QQ110b). In some examples, the hub QQ114 may be a controller, router, content source and analytics, or any of the other communication devices described herein regarding UEs. For example, the hub QQ114 may be a broadband router enabling access to the core network QQ106 for the UEs. As another example, the hub QQ114 may be a controller that sends commands or instructions to one or more actuators in the UEs. Commands or instructions may be received from the UEs, network nodes QQ110, or by executable code, script, process, or other instructions in the hub QQ114. As another example, the hub QQ114 may be a data collector that acts as temporary storage for UE data and, in some embodiments, may perform analysis or other processing of the data. As another example, the hub QQ114 may be a content source. For example, for a UE that is a VR device, display, loudspeaker, or other media delivery device, the hub QQ114 may retrieve VR assets, video, audio, or other media or data related to sensory information via a network node, which the hub QQ114 then provides to the UE either directly, after performing local processing, and / orafter adding additional local content. In still another example, the hub QQ114 acts as a proxy server or orchestrator for the UEs, in particular if one or more of the UEs are low energy loT devices.

[0402] The hub QQ114 may have a constant / persistent or intermittent connection to the network node QQ110b. The hub QQ114 may also allow for a different communication scheme and / or schedule between the hub QQ114 and UEs (e.g., UE QQ112c and / or QQ112d), and between the hub QQ114 and the core network QQ106. In other examples, the hub QQ114 is connected to the core network QQ106 and / or one or more UEs via a wired connection. Moreover, the hub QQ114 may be configured to connect to an M2M service provider over the access network QQ104 and / or to another UE over a direct connection. In some scenarios, UEs may establish a wireless connection with the network nodes QQ110 while still connected via the hub QQ114 via a wired or wireless connection. In some embodiments, the hub QQ114 may be a dedicated hub - that is, a hub whose primary function is to route communications to / from the UEs from / to the network node QQ110b. In other embodiments, the hub QQ114 may be a non-dedicated hub - that is, a device which is capable of operating to route communications between the UEs and network node QQ110b, but which is additionally capable of operating as a communication start and / or end point for certain data channels.

[0403] Figure 18 shows a UE QQ200 in accordance with some embodiments. The UE QQ200 presents additional details of some embodiments of the UE QQ112 of Figure 1. As used herein, a UE refers to a device capable, configured, arranged and / or operable to communicate wirelessly with network nodes and / or other UEs. Examples of a UE include, but are not limited to, a smart phone, mobile phone, cell phone, voice over IP (VoIP) phone, wireless local loop phone, desktop computer, personal digital assistant (PDA), wireless cameras, gaming console or device, music storage / playback device, wearable terminal device, wireless endpoint, mobile station, tablet, laptop, laptop-embedded equipment (LEE), laptop-mounted equipment (LME), an Augmented Reality (AR) or Virtual Reality (VR) device, wireless customer-premise equipment (CPE), vehicle, vehiclemounted or vehicle embedded / integrated wireless device, etc. Other examples include any UE identified by the 3rd Generation Partnership Project (3GPP), including a narrow band internet of things (NB-loT) UE, a machine type communication (MTC) UE, and / or an enhanced MTC (eMTC) UE.

[0404] A UE may support device-to-device (D2D) communication, for example by implementing a 3GPP standard for sidelink communication, Dedicated Short-Range Communication (DSRC), vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), orvehicle-to-everything (V2X). In other examples, a UE may not necessarily have a user in the sense of a human user who owns and / or operates the relevant device. Instead, a UE may represent a device that is intended for sale to, or operation by, a human user but which may not, or which may not initially, be associated with a specific human user (e.g., a smart sprinkler controller). Alternatively, a UE may represent a device that is not intended for sale to, or operation by, an end user but which may be associated with or operated for the benefit of a user (e.g., a smart power meter).

[0405] The UE QQ200 includes processing circuitry QQ202 that is operatively coupled via a bus QQ204 to an input / output interface QQ206, a power source QQ208, a memory QQ210, a communication interface QQ212, and / or any other component, or any combination thereof. Certain UEs may utilize all or a subset of the components shown in Figure 18. The level of integration between the components may vary from one UE to another UE. Further, certain UEs may contain multiple instances of a component, such as multiple processors, memories, transceivers, transmitters, receivers, etc.

[0406] The processing circuitry QQ202 is configured to process instructions and data and may be configured to implement any sequential state machine operative to execute instructions stored as machine-readable computer programs in the memory QQ210. The processing circuitry QQ202 may be implemented as one or more hardware-implemented state machines (e.g., in discrete logic, field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), etc.); programmable logic together with appropriate firmware; one or more stored computer programs, general-purpose processors, such as a microprocessor or digital signal processor (DSP), together with appropriate software; or any combination of the above. For example, the processing circuitry QQ202 may include multiple central processing units (CPUs).

[0407] In the example, the input / output interface QQ206 may be configured to provide an interface or interfaces to an input device, output device, or one or more input and / or output devices. Examples of an output device include a speaker, a sound card, a video card, a display, a monitor, a printer, an actuator, an emitter, a smartcard, another output device, or any combination thereof. An input device may allow a user to capture information into the UE QQ200. Examples of an input device include a touch-sensitive or presence-sensitive display, a camera (e.g., a digital camera, a digital video camera, a web camera, etc.), a microphone, a sensor, a mouse, a trackball, a directional pad, a trackpad, a scroll wheel, a smartcard, and the like. The presence-sensitive display may include a capacitive or resistive touch sensor to sense input from a user. A sensor may be, for instance, an accelerometer, a gyroscope, a tilt sensor, a force sensor, a magnetometer,an optical sensor, a proximity sensor, a biometric sensor, etc., or any combination thereof. An output device may use the same type of interface port as an input device. For example, a Universal Serial Bus (USB) port may be used to provide an input device and an output device.

[0408] In some embodiments, the power source QQ208 is structured as a battery or battery pack. Other types of power sources, such as an external power source (e.g., an electricity outlet), photovoltaic device, or power cell, may be used. The power source QQ208 may further include power circuitry for delivering power from the power source QQ208 itself, and / or an external power source, to the various parts of the UE QQ200 via input circuitry or an interface such as an electrical power cable. Delivering power may be, for example, for charging of the power source QQ208. Power circuitry may perform any formatting, converting, or other modification to the power from the power source QQ208 to make the power suitable for the respective components of the UE QQ200 to which power is supplied.

[0409] The memory QQ210 may be or be configured to include memory such as random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic disks, optical disks, hard disks, removable cartridges, flash drives, and so forth. In one example, the memory QQ210 includes one or more application programs QQ214, such as an operating system, web browser application, a widget, gadget engine, or other application, and corresponding data QQ216. The memory QQ210 may store, for use by the UE QQ200, any of a variety of various operating systems or combinations of operating systems.

[0410] The memory QQ210 may be configured to include a number of physical drive units, such as redundant array of independent disks (RAID), flash memory, USB flash drive, external hard disk drive, thumb drive, pen drive, key drive, high-density digital versatile disc (HD-DVD) optical disc drive, internal hard disk drive, Blu-Ray optical disc drive, holographic digital data storage (HDDS) optical disc drive, external mini-dual in-line memory module (DIMM), synchronous dynamic random access memory (SDRAM), external micro-DIMM SDRAM, smartcard memory such as tamper resistant module in the form of a universal integrated circuit card (UICC) including one or more subscriber identity modules (SIMs), such as a USIM and / or ISIM, other memory, or any combination thereof. The UICC may for example be an embedded UICC (eUlCC), integrated UICC (iUICC) or a removable UICC commonly known as ‘SIM card.’ The memory QQ210 may allow the UE QQ200 to access instructions, application programs and the like, stored on transitoryor non-transitory memory media, to off-load data, or to upload data. An article of manufacture, such as one utilizing a communication system may be tangibly embodied as or in the memory QQ210, which may be or comprise a device-readable storage medium.

[0411] The processing circuitry QQ202 may be configured to communicate with an access network or other network using the communication interface QQ212. The communication interface QQ212 may comprise one or more communication subsystems and may include or be communicatively coupled to an antenna QQ222. The communication interface QQ212 may include one or more transceivers used to communicate, such as by communicating with one or more remote transceivers of another device capable of wireless communication (e.g., another UE or a network node in an access network). Each transceiver may include a transmitter QQ218 and / or a receiver QQ220 appropriate to provide network communications (e.g., optical, electrical, frequency allocations, and so forth). Moreover, the transmitter QQ218 and receiver QQ220 may be coupled to one or more antennas (e.g., antenna QQ222) and may share circuit components, software or firmware, or alternatively be implemented separately.

[0412] In the illustrated embodiment, communication functions of the communication interface QQ212 may include cellular communication, Wi-Fi communication, LPWAN communication, data communication, voice communication, multimedia communication, short-range communications such as Bluetooth, near-field communication, location-based communication such as the use of the global positioning system (GPS) to determine a location, another like communication function, or any combination thereof.

[0413] Communications may be implemented in according to one or more communication protocols and / or standards, such as IEEE 802.11, Code Division Multiplexing Access (CDMA), Wideband Code Division Multiple Access (WCDMA), GSM, LTE, New Radio (NR), UMTS, WiMax, Ethernet, transmission control protocol / internet protocol (TCP / IP), synchronous optical networking (SONET), Asynchronous Transfer Mode (ATM), QUIC, Hypertext Transfer Protocol (HTTP), and so forth.

[0414] Regardless of the type of sensor, a UE may provide an output of data captured by its sensors, through its communication interface QQ212, via a wireless connection to a network node. Data captured by sensors of a UE can be communicated through a wireless connection to a network node via another UE. The output may be periodic (e.g., once every 15 minutes if it reports the sensed temperature), random (e.g., to even out the load from reporting from several sensors), in response to a triggering event (e.g., when moisture is detected an alert is sent), in response to a request (e.g., a user initiated request), or a continuous stream (e.g., a live video feed of a patient).As another example, a UE comprises an actuator, a motor, or a switch, related to a communication interface configured to receive wireless input from a network node via a wireless connection. In response to the received wireless input the states of the actuator, the motor, or the switch may change. For example, the UE may comprise a motor that adjusts the control surfaces or rotors of a drone in flight according to the received input or to a robotic arm performing a medical procedure according to the received input.

[0415] A UE, when in the form of an Internet of Things (loT) device, may be a device for use in one or more application domains, these domains comprising, but not limited to, city wearable technology, extended industrial application and healthcare. Non-limiting examples of such an IoT device are a device which is or which is embedded in: a connected refrigerator or freezer, a TV, a connected lighting device, an electricity meter, a robot vacuum cleaner, a voice controlled smart speaker, a home security camera, a motion detector, a thermostat, a smoke detector, a door / window sensor, a flood / moisture sensor, an electrical door lock, a connected doorbell, an air conditioning system like a heat pump, an autonomous vehicle, a surveillance system, a weather monitoring device, a vehicle parking monitoring device, an electric vehicle charging station, a smartwatch, a fitness tracker, a wearable for tactile augmentation or sensory enhancement, a water sprinkler, an animal- or item-tracking device, a sensor for monitoring a plant or animal, an industrial robot, an Unmanned Aerial Vehicle (UAV), and any kind of medical device, like a heart rate monitor or a remote controlled surgical robot. A UE in the form of an IoT device comprises circuitry and / or software in dependence of the intended application of the loT device in addition to other components as described in relation to the UE QQ200 shown in Figure 18.

[0416] As yet another specific example, in an loT scenario, a UE may represent a machine or other device that performs monitoring and / or measurements, and transmits the results of such monitoring and / or measurements to another UE and / or a network node. The UE may in this case be an M2M device, which may in a 3GPP context be referred to as an MTC device. As one particular example, the UE may implement the 3GPP NB-loT standard. In other scenarios, a UE may represent a vehicle, such as a car, a bus, a truck, a ship and an airplane, or other equipment that is capable of monitoring and / or reporting on its operational status or other functions associated with its operation.

[0417] In practice, any number of UEs may be used together with respect to a single use case. For example, a first UE might be or be integrated in a drone and provide the drone’s speed information (obtained through a speed sensor) to a second UE that is a remote controller operating the drone. When the user makes changes from the remote controller,the first UE may adjust the throttle on the drone (e.g. by controlling an actuator) to increase or decrease the drone’s speed. The first and / or the second UE can also include more than one of the functionalities described above. For example, a UE might comprise the sensor and the actuator, and handle communication of data for both the speed sensor and the actuators.

[0418] Figure 19 shows a network node QQ300 in accordance with some embodiments. As used herein, network node refers to equipment capable, configured, arranged and / or operable to communicate directly or indirectly with a UE and / or with other network nodes or equipment, in a telecommunication network. Examples of network nodes include, but are not limited to, access points (APs) (e.g., radio access points), base stations (BSs) (e.g., radio base stations, Node Bs, evolved Node Bs (eNBs) and NR NodeBs (gNBs)), O-RAN nodes or components of an O-RAN node (e.g., O-RU, O-DU, O-CU).

[0419] Base stations may be categorized based on the amount of coverage they provide (or, stated differently, their transmit power level) and so, depending on the provided amount of coverage, may be referred to as femto base stations, pico base stations, micro base stations, or macro base stations. A base station may be a relay node or a relay donor node controlling a relay. A network node may also include one or more (or all) parts of a distributed radio base station such as centralized digital units, distributed units (e.g., in an O-RAN access node) and / or remote radio units (RRUs), sometimes referred to as Remote Radio Heads (RRHs). Such remote radio units may or may not be integrated with an antenna as an antenna integrated radio. Parts of a distributed radio base station may also be referred to as nodes in a distributed antenna system (DAS).

[0420] Other examples of network nodes include multiple transmission point (multi-TRP) 5G access nodes, multi-standard radio (MSR) equipment such as MSR BSs, network controllers such as radio network controllers (RNCs) or base station controllers (BSCs), base transceiver stations (BTSs), transmission points, transmission nodes, multi-cel l / multicast coordination entities (MCEs), Operation and Maintenance (O& M) nodes, Operations Support System (OSS) nodes, Self-Organizing Network (SON) nodes, positioning nodes (e.g., Evolved Serving Mobile Location Centers (E-SMLCs)), and / or Minimization of Drive Tests (MDTs).

[0421] The network node QQ300 includes a processing circuitry QQ302, a memory QQ304, a communication interface QQ306, and a power source QQ308. The network node QQ300 may be composed of multiple physically separate components (e.g., a NodeB component and a RNC component, or a BTS component and a BSC component, etc.), which may each have their own respective components. In certain scenarios inwhich the network node QQ300 comprises multiple separate components (e.g., BTS and BSC components), one or more of the separate components may be shared among several network nodes. For example, a single RNC may control multiple NodeBs. In such a scenario, each unique NodeB and RNC pair, may in some instances be considered a single separate network node. In some embodiments, the network node QQ300 may be configured to support multiple radio access technologies (RATs). In such embodiments, some components may be duplicated (e.g., separate memory QQ304 for different RATs) and some components may be reused (e.g., a same antenna QQ310 may be shared by different RATs). The network node QQ300 may also include multiple sets of the various illustrated components for different wireless technologies integrated into network node QQ300, for example GSM, WCDMA, LTE, NR, WiFi, Zigbee, Z-wave, LoRaWAN, Radio Frequency Identification (RFID) or Bluetooth wireless technologies. These wireless technologies may be integrated into the same or different chip or set of chips and other components within network node QQ300.

[0422] The processing circuitry QQ302 may comprise a combination of one or more of a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application-specific integrated circuit, field programmable gate array, or any other suitable computing device, resource, or combination of hardware, software and / or encoded logic operable to provide, either alone or in conjunction with other network node QQ300 components, such as the memory QQ304, to provide network node QQ300 functionality.

[0423] In some embodiments, the processing circuitry QQ302 includes a system on a chip (SOC). In some embodiments, the processing circuitry QQ302 includes one or more of radio frequency (RF) transceiver circuitry QQ312 and baseband processing circuitry QQ314. In some embodiments, the radio frequency (RF) transceiver circuitry QQ312 and the baseband processing circuitry QQ314 may be on separate chips (or sets of chips), boards, or units, such as radio units and digital units. In alternative embodiments, part or all of RF transceiver circuitry QQ312 and baseband processing circuitry QQ314 may be on the same chip or set of chips, boards, or units.

[0424] The memory QQ304 may comprise any form of volatile or non-volatile computer-readable memory including, without limitation, persistent storage, solid-state memory, remotely mounted memory, magnetic media, optical media, random access memory (RAM), read-only memory (ROM), mass storage media (for example, a hard disk), removable storage media (for example, a flash drive, a Compact Disk (CD) or a Digital Video Disk (DVD)), and / or any other volatile or non-volatile, non-transitory device-readable and / or computer-executable memory devices that store information, data, and / or instructions that may be used by the processing circuitry QQ302. The memory QQ304 may store any suitable instructions, data, or information, including a computer program, software, an application including one or more of logic, rules, code, tables, and / or other instructions capable of being executed by the processing circuitry QQ302 and utilized by the network node QQ300. The memory QQ304 may be used to store any calculations made by the processing circuitry QQ302 and / or any data received via the communication interface QQ306. In some embodiments, the processing circuitry QQ302 and memory QQ304 is integrated.

[0425] The communication interface QQ306 is used in wired or wireless communication of signaling and / or data between a network node, access network, and / or UE. As illustrated, the communication interface QQ306 comprises port(s) / terminal(s) QQ316 to send and receive data, for example to and from a network over a wired connection. The communication interface QQ306 also includes radio front-end circuitry QQ318 that may be coupled to, or in certain embodiments a part of, the antenna QQ310. Radio front-end circuitry QQ318 comprises filters QQ320 and amplifiers QQ322. The radio front-end circuitry QQ318 may be connected to an antenna QQ310 and processing circuitry QQ302. The radio front-end circuitry may be configured to condition signals communicated between antenna QQ310 and processing circuitry QQ302. The radio front-end circuitry QQ318 may receive digital data that is to be sent out to other network nodes or UEs via a wireless connection. The radio front-end circuitry QQ318 may convert the digital data into a radio signal having the appropriate channel and bandwidth parameters using a combination of filters QQ320 and / or amplifiers QQ322. The radio signal may then be transmitted via the antenna QQ310. Similarly, when receiving data, the antenna QQ310 may collect radio signals which are then converted into digital data by the radio front-end circuitry QQ318. The digital data may be passed to the processing circuitry QQ302. In other embodiments, the communication interface may comprise different components and / or different combinations of components.

[0426] In certain alternative embodiments, the network node QQ300 does not include separate radio front-end circuitry QQ318, instead, the processing circuitry QQ302 includes radio front-end circuitry and is connected to the antenna QQ310. Similarly, in some embodiments, all or some of the RF transceiver circuitry QQ312 is part of the communication interface QQ306. In still other embodiments, the communication interface QQ306 includes one or more ports or terminals QQ316, the radio front-end circuitry QQ318, and the RF transceiver circuitry QQ312, as part of a radio unit (not shown), andthe communication interface QQ306 communicates with the baseband processing circuitry QQ314, which is part of a digital unit (not shown).

[0427] The antenna QQ310 may include one or more antennas, or antenna arrays, configured to send and / or receive wireless signals. The antenna QQ310 may be coupled to the radio front-end circuitry QQ318 and may be any type of antenna capable of transmitting and receiving data and / or signals wirelessly. In certain embodiments, the antenna QQ310 is separate from the network node QQ300 and connectable to the network node QQ300 through an interface or port.

[0428] The antenna QQ310, communication interface QQ306, and / or the processing circuitry QQ302 may be configured to perform any receiving operations and / or certain obtaining operations described herein as being performed by the network node. Any information, data and / or signals may be received from a UE, another network node and / or any other network equipment. Similarly, the antenna QQ310, the communication interface QQ306, and / or the processing circuitry QQ302 may be configured to perform any transmitting operations described herein as being performed by the network node. Any information, data and / or signals may be transmitted to a UE, another network node and / or any other network equipment.

[0429] The power source QQ308 provides power to the various components of network node QQ300 in a form suitable for the respective components (e.g., at a voltage and current level needed for each respective component). The power source QQ308 may further comprise, or be coupled to, power management circuitry to supply the components of the network node QQ300 with power for performing the functionality described herein. For example, the network node QQ300 may be connectable to an external power source (e.g., the power grid, an electricity outlet) via an input circuitry or interface such as an electrical cable, whereby the external power source supplies power to power circuitry of the power source QQ308. As a further example, the power source QQ308 may comprise a source of power in the form of a battery or battery pack which is connected to, or integrated in, power circuitry. The battery may provide backup power should the external power source fail.

[0430] Embodiments of the network node QQ300 may include additional components beyond those shown in 15 for providing certain aspects of the network node’s functionality, including any of the functionality described herein and / or any functionality necessary to support the subject matter described herein. For example, the network node QQ300 may include user interface equipment to allow input of information into the network node QQ300 and to allow output of information from the network node QQ300.This may allow a user to perform diagnostic, maintenance, repair, and other administrative functions for the network node QQ300. In some embodiments providing a core network node, such as core network node 108 of FIG. QQ1, some components, such as the radio front-end circuitry QQ318 and the RF transceiver circuitry QQ312 may be omitted.

[0431] Figure 20 is a block diagram illustrating a virtualization environment QQ400 in which functions implemented by some embodiments may be virtualized. In the present context, virtualizing means creating virtual versions of apparatuses or devices which may include virtualizing hardware platforms, storage devices and networking resources. As used herein, virtualization can be applied to any device described herein, or components thereof, and relates to an implementation in which at least a portion of the functionality is implemented as one or more virtual components. Some or all of the functions described herein may be implemented as virtual components executed by one or more virtual machines (VMs) implemented in one or more virtual environments QQ400 hosted by one or more of hardware nodes, such as a hardware computing device that operates as a network node, UE, core network node, or host. Further, in embodiments in which the virtual node does not require radio connectivity (e.g., a core network node or host), then the node may be entirely virtualized. In some embodiments, the virtualization environment QQ400 includes components defined by the O-RAN Alliance, such as an O-Cloud environment orchestrated by a Service Management and Orchestration Framework via an 0-2 interface. Virtualization may facilitate distributed implementations of a network node, UE, core network node, or host.

[0432] Applications QQ402 (which may alternatively be called software instances, virtual appliances, network functions, virtual nodes, virtual network functions, etc.) are run in the virtualization environment QQ400 to implement some of the features, functions, and / or benefits of some of the embodiments disclosed herein.

[0433] Hardware QQ404 includes processing circuitry, memory that stores software and / or instructions executable by hardware processing circuitry, and / or other hardware devices as described herein, such as a network interface, input / output interface, and so forth. Software may be executed by the processing circuitry to instantiate one or more virtualization layers QQ406 (also referred to as hypervisors or virtual machine monitors (VMMs)), provide VMs QQ408a and QQ408b (one or more of which may be generally referred to as VMs QQ408), and / or perform any of the functions, features and / or benefits described in relation with some embodiments described herein. The virtualization layerQQ406 may present a virtual operating platform that appears like networking hardware to the VMs QQ408.

[0434] The VMs QQ408 comprise virtual processing, virtual memory, virtual networking or interface and virtual storage, and may be run by a corresponding virtualization layer QQ406. Different embodiments of the instance of a virtual appliance QQ402 may be implemented on one or more of VMs QQ408, and the implementations may be made in different ways. Virtualization of the hardware is in some contexts referred to as network function virtualization (NFV). NFV may be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which can be located in data centers, and customer premise equipment.

[0435] In the context of NFV, a VM QQ408 may be a software implementation of a physical machine that runs programs as if they were executing on a physical, non-virtualized machine. Each of the VMs QQ408, and that part of hardware QQ404 that executes that VM, be it hardware dedicated to that VM and / or hardware shared by that VM with others of the VMs, forms separate virtual network elements. Still in the context of NFV, a virtual network function is responsible for handling specific network functions that run in one or more VMs QQ408 on top of the hardware QQ404 and corresponds to the application QQ402.

[0436] Hardware QQ404 may be implemented in a standalone network node with generic or specific components. Hardware QQ404 may implement some functions via virtualization. Alternatively, hardware QQ404 may be part of a larger cluster of hardware (e.g. such as in a data center or CPE) where many hardware nodes work together and are managed via management and orchestration QQ410, which, among others, oversees lifecycle management of applications QQ402. In some embodiments, hardware QQ404 is coupled to one or more radio units that each include one or more transmitters and one or more receivers that may be coupled to one or more antennas. Radio units may communicate directly with other hardware nodes via one or more appropriate network interfaces and may be used in combination with the virtual components to provide a virtual node with radio capabilities, such as a radio access node or a base station. In some embodiments, some signaling can be provided with the use of a control system QQ412 which may alternatively be used for communication between hardware nodes and radio units.

[0437] Although the computing devices described herein (e.g., UEs, network nodes) may include the illustrated combination of hardware components, other embodiments may comprise computing devices with different combinations of components. It is to beunderstood that these computing devices may comprise any suitable combination of hardware and / or software needed to perform the tasks, features, functions and methods disclosed herein. Determining, calculating, obtaining or similar operations described herein may be performed by processing circuitry, which may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the network node, and / or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination. Moreover, while components are depicted as single boxes located within a larger box, or nested within multiple boxes, in practice, computing devices may comprise multiple different physical components that make up a single illustrated component, and functionality may be partitioned between separate components. For example, a communication interface may be configured to include any of the components described herein, and / or the functionality of the components may be partitioned between the processing circuitry and the communication interface. In another example, non-computationally intensive functions of any of such components may be implemented in software or firmware and computationally intensive functions may be implemented in hardware.

[0438] In certain embodiments, some or all of the functionality described herein may be provided by processing circuitry executing instructions stored on in memory, which in certain embodiments may be a computer program product in the form of a non-transitory computer-readable storage medium. In alternative embodiments, some or all of the functionality may be provided by the processing circuitry without executing instructions stored on a separate or discrete device-readable storage medium, such as in a hard-wired manner. In any of those particular embodiments, whether executing instructions stored on a non-transitory computer-readable storage medium or not, the processing circuitry can be configured to perform the described functionality. The benefits provided by such functionality are not limited to the processing circuitry alone or to other components of the computing device, but are enjoyed by the computing device as a whole, and / or by end users and a wireless network generally.

[0439] When using the word "comprise" or “comprising” it shall be interpreted as nonlimiting, i.e. meaning "consist at least of".The embodiments herein are not limited to the preferred embodiments described above. Various alternatives, modifications and equivalents may be used.

Claims

CLAIMS1. A method for handling training of a Machine Learning, ML, model in a communication network (100), the method comprising:calculating (702) a respective training score for one or more first clients from a set of clients, the one or more first clients having performed training of the ML model, updating (703) a set of dictionaries based on the respective training scores, selecting (704) one or more subsequent clients from the set of clients based on the set of dictionaries,responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more clients, not achieving convergence, initiating (705) further training of the ML model by the one or more subsequent clients, or responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stopping (710) training the ML model..

2. The method according to claim 1, wherein responsive to initiating (705) further training of the ML model, the method further comprises:calculating (706) a respective subsequent training score for one or more subsequent clients from the set of clients, the one or more subsequent clients having performed training of the ML model,updating (707) the set of dictionaries based on the respective subsequent training scores,selecting (708) one or more further subsequent clients from the set of clients based on the set of dictionaries, andresponsive to a subsequent aggregated ML model, which subsequent aggregated model is aggregated from local models trained by the one or more subsequent clients, not achieving convergence, initiating (709) further training of the ML model by the one or more further subsequent clients, orresponsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stopping (710) training the ML model.

3. The method according to any of claims 1-2, wherein the training score is calculated based on a number of training metrics associated with the respective one or more first clients, and wherein the number of metrics comprises any one or more out of:- processor load,- battery,- memory,- model drift,- training time,- upload speed, and- energy consumption.

4. The method according to any of claims 1-3, wherein each metric is associated with a weight factor, and wherein the training score is calculated by weighting each metric with its associated weight factor.

5. The method according to claim 4, wherein calculating the score comprises applying a penalty factor.

6. The method according to claim 5, wherein the penalty factor is increased responsive to a client being selected to train the ML model, and wherein the penalty factor is set to zero responsive to a client not being selected for training the ML model for a number of iterations larger than threshold.

7. The method according to any of claims 1-6, wherein the set of dictionaries comprises a first dictionary and a second dictionary, and wherein the first dictionary is related to the training score of the clients in the set of clients, and wherein the second dictionary is related to a selection history of the clients in the set of clients.

8. The method according to claims 7, wherein the selection history comprises number of times a client has been selected and a number of iteration of training the ML model since the client was selected, and wherein the second dictionary further comprises the penalty factor.

9. The method according to any of claims 1-8, wherein the method further comprises:responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stopping (710) training the ML model.

10. A computer program (13, 23, 33, 43) comprising instructions, which when executed by a processor (11, 21, 31, 41), causes the processor (11, 21, 31, 41) to perform actions according to any of the claims 18-24.

11. A carrier (14, 124, 34, 44) comprising the computer program (13, 23, 33, 43) of claim 25, wherein the carrier (14, 124, 34, 44) is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.

12. A training node (130) configured to handle training of a Machine Learning, ML, model in a communication network (100), the training node (130) further being configured to:calculate a respective training score for one or more first clients from a set of clients, the one or more first clients having performed training of the ML model,update a set of dictionaries based on the respective training scores,select one or more subsequent clients from the set of clients based on the set of dictionaries,responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by the one or more clients, not achieving convergence, initiate further training of the ML model by the one or more subsequent clients, orresponsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stop training the ML model.

13. The training node (130) according to claim 12, wherein responsive to initiate further training of the ML model, the training node (130) is further configured to:calculate a respective subsequent training score for one or more subsequent clients from the set of clients, the one or more subsequent clients having performed training of the ML model,update the set of dictionaries based on the respective subsequent training scores,select one or more further subsequent clients from the set of clients based on the set of dictionaries, andresponsive to a subsequent aggregated ML model, which subsequent aggregated model is aggregated from local models trained by the one or more subsequent clients, not achieving convergence, initiate further training of the ML model by the one or more further subsequent clients, orresponsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stop training the ML model.

14. The training node (130) according to any of claims 12-13, wherein the training score is calculated based on a number of training metrics associated with the respective one or more first clients, and wherein the number of metrics comprises any one or more out of:- processor load,- battery,- memory,- model drift,- training time,- upload speed, and- energy consumption.

15. The training node (130) according to any of claims 12-14, wherein each metric is associated with a weight factor, and wherein the training score is calculated by weighting each metric with its associated weight factor.

16. The training node (130) according to claim 15, wherein calculating the score comprises applying a penalty factor.

17. The training node (130) according to claim 16, wherein the penalty factor is increased responsive to a client being selected to train the ML model, and wherein the penalty factor is set to zero responsive to a client not being selected for training the ML model for a number of iterations larger than threshold.

18. The training node (130) according to any of claims 12-17, wherein the set of dictionaries comprises a first dictionary and a second dictionary, and wherein the firstdictionary is related to the training score of the clients in the set of clients, and wherein the second dictionary is related to a selection history of the clients in the set of clients.

19. The training node (130) according to claims 18, wherein the selection history comprises number of times a client has been selected and a number of iterations of training the ML model since the client was selected, and wherein the second dictionary further comprises the penalty factor.

20. The training node (130) according to any of claims 12-19, wherein the training node (130) is further configured to:responsive to an aggregated ML model, which aggregated model is aggregated from local models trained by one or more clients, achieving convergence, stop training the ML model.

21. The training node (130) according to any of claims 12-20, wherein the training node (130) is further configured to comprise a client selection function (131), a client profile function (132) and a client resource function (133), wherein the functions (131, 132, 133) performs features according to any of claims 12-20.