Methods and apparatuses for deploying model diversity in a core network
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
- Filing Date
- 2025-04-24
- Publication Date
- 2026-06-25
Smart Images

Figure EP2025061252_25062026_PF_FP_ABST
Abstract
Description
[0001] METHODS AND APPARATUSES FOR DEPLOYING MODEL DIVERSITY IN A CORE NETWORK
[0002] TECHNICAL FIELD
[0003] Embodiments described herein relate to methods and apparatuses for deploying model diversity in core network (for example a 5G, 6G or beyond core network) to improve processes to provide analytics from a network function to an analytics consumer.
[0004] BACKGROUND
[0005] An analytics consumer, such as a consumer network function (cNF), can subscribe to analytics from a network function, such as a Network Data Analytics Function (NWDAF), by providing for example an analytics identifier (ID) and a preferred accuracy level among other information. The NWDAF selects a trained Machine Learning (ML) model to provide the analytics by running the model in the Analytics logical function (AnLF) in the inference mode.
[0006] ML models are trained via the Model Training Logical Function (MTLF) inside the NWDAF. Currently, multiple trained models for each analytics ID can exist and the information for them can be obtain by the NnwdafJMLModellnfo service described in TS 23.288 version 18.5.0. Different variety of models with various levels of accuracy and complexity can be obtained by model compression e.g., via Tensor Decomposition, Data Quantization, and Network Sparsification (see for example L. Deng, G. Li, S. Han, L. Shi and Y. Xie, "Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey," in Proceedings of the IEEE, vol. 108, no. 4, pp. 485-532, April 2020). Additionally, new ML models with different structures can be trained to generate a certain prediction.
[0007] Within an NWDAF, multiple ML models can be run simultaneously in the AnLF to provide predictions to one or more cNFs. Each model may provide inference for a group of User Equipment (UEs), whose size may vary over time (for example, depending on the number of UEs within a certain area). This may cause the NWDAF to have a variable load over time. SUMMARY
[0008] There currently exist certain challenge(s). The number of UEs for which predictions are generated may be variable overtime. This will make the load of the NWDAF variable, which either may cause a shortage of resources or may require reserving resources in the AnLF, which is wasteful.
[0009] Furthermore, the level of latency of the ML models is not considered when the cNF subscribes to analytics from the NWDAF. Some delay sensitive operations (for example, related to the UE’s movement) may prefer lower latencies.
[0010] Certain aspects of the disclosure and their embodiments may provide solutions to these or other challenges. In some embodiments, it is proposed for the AnLF to use different ML models in different circumstances to perform load balancing, which may involve introducing new signaling and parameters. Different variants of models may be used for performing resource management in the NWDAF. When faced with resource constraints, the NWDAF may, for example, temporarily use a lighter model (potentially a non-neural network-based model) with lower accuracy. This approach may be acceptable for applications that can tolerate such behaviour, for example, user localization for geo advertising.
[0011] Additionally, the latency of an ML model may be considered in the cNFs request to subscribe to the analytics. Furthermore, the ML model’s metadata may be enhanced to include the latency and resource requirements for running in the inference mode. If a model with superior performance metrics takes a longer time to deliver an inference, cannot run many instances in parallel or incurs a high cost, the NWDAF may opt to use a lighter ML model.
[0012] According to a first aspect of the present disclosure, there is provided a method performed by a network function in a communications network for providing analytics to an analytics consumer in the communications network. The method comprises receiving, from the analytics consumer, a request for the analytics. The method further comprises, responsive to available resources at the network function meeting a condition, operating in a first mode of operation by using a first ML model to obtain the analytics, wherein the first ML model requires a first amount of resources to perform inference. The method further comprises, responsive to available resources at the network function not meeting the condition, operating in a second mode of operation by using a second ML model to obtain the analytics, wherein the second ML model requires a second amount of resources to perform inference, wherein the second amount of resources is lower than the first amount of resources. The method further comprises transmitting the analytics to the analytics consumer.
[0013] According to a second aspect of the present disclosure, there is provided a method performed by a network function in a communications network for providing analytics to an analytics consumer in the communications network. The method comprises receiving, from the analytics consumer, a request for the analytics, wherein the request for the analytics comprises an indication of a preferred level of latency. The method further comprises selecting a first Machine Learning, ML, model that performs inference with less than or equal to the preferred level of latency. The method further comprises obtaining, using the first ML model, the analytics. The method further comprises transmitting, to the analytics consumer, the analytics.
[0014] According to a third aspect of the present disclosure, there is provided a method performed by an analytics consumer in a communications network for receiving analytics from a network function. The method comprises transmitting, to the network function, a request for the analytics. The method further comprises receiving, from the network function, the analytics. The method further comprises receiving, from the network function, an indication to the analytics consumer of whether the network function is operating in the first mode of operation or the second mode of operation to obtain the analytics.
[0015] According to a fourth aspect of the present disclosure, there is provided a method performed by an analytics consumer in a communications network for receiving analytics from a network function. The method comprises transmitting, to the network function, a request for the analytics, wherein the request for the analytics comprises an indication of a preferred level of latency. The method further comprises receiving, from the network function, the analytics, wherein the analytics are obtained from a first Machine Learning, ML, model that performs inference with less than or equal to the preferred level of latency.
[0016] According to some embodiments there is provided a network function in a communications network for providing analytics to an analytics consumer in the communications network, the network function comprising processing circuitry and a memory, the memory containing instructions executable by the processing circuitry whereby the network function is operable to receive, from the analytics consumer, a request for the analytics. The memory contains further instructions executable by the processing circuitry whereby the network function is operable to, responsive to available resources at the network function meeting a condition, operate in a first mode of operation by using a first ML model to obtain the analytics, wherein the first ML model requires a first amount of resources to perform inference. The memory contains further instructions executable by the processing circuitry whereby the network function is operable to, responsive to available resources at the network function not meeting the condition, operate in a second mode of operation by using a second ML model to obtain the analytics, wherein the second ML model requires a second amount of resources to perform inference, wherein the second amount of resources is lower than the first amount of resources. The memory contains further instructions executable by the processing circuitry whereby the network function is operable to transmit the analytics to the analytics consumer.
[0017] According to some embodiments there is provided a network function in a communications network for providing analytics to an analytics consumer in the communications network, the network function comprising processing circuitry and a memory, the memory containing instructions executable by the processing circuitry whereby the network function is operable to receive, from the analytics consumer, a request for the analytics, wherein the request for the analytics comprises an indication of a preferred level of latency. The memory contains further instructions executable by the processing circuitry whereby the network function is operable to select a first Machine Learning, ML, model that performs inference with less than or equal to the preferred level of latency. The memory contains further instructions executable by the processing circuitry whereby the network function is operable to obtain, using the first ML model, the analytics. The memory contains further instructions executable by the processing circuitry whereby the network function is operable to transmit, to the analytics consumer, the analytics.
[0018] According to some embodiments there is provided an analytics consumer in a communications network for receiving analytics from a network function, the analytics consumer comprising processing circuitry and a memory, the memory containing instructions executable by the processing circuitry whereby the analytics consumer is operable to transmit, to the network function, a request for the analytics. The memory contains further instructions executable by the processing circuitry whereby the analytics consumer is operable to receive, from the network function, the analytics. The memory contains further instructions executable by the processing circuitry whereby the analytics consumer is operable to receive, from the network function, an indication to the analytics consumer of whether the network function is operating in the first mode of operation or the second mode of operation to obtain the analytics.
[0019] According to some embodiments there is provided an analytics consumer in a communications network for receiving analytics from a network function, the analytics consumer comprising processing circuitry and a memory, the memory containing instructions executable by the processing circuitry whereby the analytics consumer is operable to transmit, to the network function, a request for the analytics, wherein the request for the analytics comprises an indication of a preferred level of latency. The memory contains further instructions executable by the processing circuitry whereby the analytics consumer is operable to receive, from the network function, the analytics, wherein the analytics are obtained from a first Machine Learning, ML, model that performs inference with less than or equal to the preferred level of latency.
[0020] Certain embodiments may provide one or more of the following technical advantage(s). Some embodiments enable avoiding excessive and unnecessary compute and energy consumption within an NWDAF containing an AnLF that runs ML models during an inference phase. Additionally, an improved resource management within the NWDAF containing the AnLF may be provided and an improved scalability of the NWDAF containing AnLF, which may enable serving more parallel NFs with the same NWDAF.
[0021] For operations which are delay sensitive, ML model delay as a parameter for model selection during the inference mode may be included in the request for analytics from the cNF and the NWDAF may be enabled to select a lighter ML model with lower latency. Additionally, some embodiments enable the NWDAF to be more responsive to the dynamic (time-dependent) demand of the accuracy requirements of cNF. BRIEF DESCRIPTION OF THE DRAWINGS
[0022] For a better understanding of the embodiments of the present disclosure, and to show how it may be put into effect, reference will now be made, by way of example only, to the accompanying drawings, in which:
[0023] Figure 1 illustrates an example architecture of an analytics consumer subscribing to analytics from a network function;
[0024] Figure 2 illustrates a flowchart showing an example method for providing analytics to an analytics consumer in a communications network, according to certain embodiments;
[0025] Figure 3 illustrates a flowchart showing an example method for providing analytics to an analytics consumer in a communications network, according to certain embodiments;
[0026] Figure 4 illustrates a signalling diagram showing an example implementation of the methods of Figures 2 and 3;
[0027] Figure 5 illustrates a flowchart showing an example method for providing analytics to an analytics consumer in a communications network, according to certain embodiments;
[0028] Figure 6 illustrates a flowchart showing an example method for providing analytics to an analytics consumer in a communications network, according to certain embodiments;
[0029] Figure 7 illustrates a signalling diagram showing an example implementation of the method of Figure 5;
[0030] Figure 8 illustrates a signalling diagram showing an example implementation of a combination of the methods of Figure 2, 3, 5 and 6;
[0031] Figure 9 illustrates a signalling diagram showing an example implementation of signalling between different internal functions of a network function; Figure 10 illustrates a signalling diagram showing an example implementation of signalling to keep stored metadata for an ML model up to date;
[0032] Figure 11 shows a network node in accordance with some embodiments;
[0033] Figure 12 is a block diagram illustrating a virtualization environment in which functions implemented by some embodiments may be virtualized.
[0034] DETAILED DESCRIPTION
[0035] Generally, all terms used herein are to be interpreted according to their ordinary meaning in the relevant technical field, unless a different meaning is clearly given and / or is implied from the context in which it is used. All references to a / an / the element, apparatus, component, means, step, etc. are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any methods disclosed herein do not have to be performed in the exact order disclosed, unless a step is explicitly described as following or preceding another step and / or where it is implicit that a step must follow or precede another step. Any feature of any of the embodiments disclosed herein may be applied to any other embodiment, wherever appropriate. Likewise, any advantage of any of the embodiments may apply to any other embodiments, and vice versa. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following description.
[0036] The following sets forth specific details, such as particular embodiments or examples for purposes of explanation and not limitation. It will be appreciated by one skilled in the art that other examples may be employed apart from these specific details. In some instances, detailed descriptions of well-known methods, nodes, interfaces, circuits, and devices are omitted so as not obscure the description with unnecessary detail. Those skilled in the art will appreciate that the functions described may be implemented in one or more nodes using hardware circuitry (e.g., analog and / or discrete logic gates interconnected to perform a specialized function, ASICs, PLAs, etc.) and / or using software programs and data in conjunction with one or more digital microprocessors or general purpose computers. Nodes that communicate using the air interface may have suitable radio communications circuitry. Moreover, where appropriate the technology can additionally be considered to be embodied entirely within any form of computer-readable memory, such as (ROM, EEPROM, Flash memory, a memory disc, RAM etc.) solid-state memory, magnetic disk, or optical disk containing an appropriate set of computer instructions that would cause a processor to carry out the techniques described herein.
[0037] Hardware implementation may include or encompass, without limitation, digital signal processor (DSP) hardware, a reduced instruction set processor, hardware (e.g., digital or analogue) circuitry including but not limited to application specific integrated circuit(s) (ASIC) and / or field programmable gate array(s) (FPGA(s)), and (where appropriate) state machines capable of performing such functions.
[0038] Certain aspects of the present disclosure and their embodiments may provide solutions to these or other challenges.
[0039] Particular embodiments are described more fully with reference to the accompanying drawings. Other embodiments, however, are contained within the scope of the subject matter disclosed herein. The disclosed subject matter should not be construed as limited to only the embodiments set forth herein; rather, these embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art.
[0040] For the purposes of the present disclosure, the term “ML model” encompasses within its scope the following concepts:
[0041] Machine Learning algorithms, comprising processes or instructions through which data may be used in a training process to generate a model artefact for performing a given task, or for representing a real world process or system; the model artefact that is created by such a training process, and which comprises the computational architecture that performs the task; and the process performed by the model artefact in order to complete the task.
[0042] References to “ML model”, “model”, model parameters”, “model information”, etc., may thus be understood as relating to any one or more of the above concepts encompassed within the scope of “ML model”. It will be appreciated that a representation of a ML model may comprise parameters of the ML model, the ML model itself, or any other suitable representation of the ML model that may be utilised by the node receiving the representation appropriately. A representation of an ML model may for example include details of the architecture of the model and values for its trainable parameters, for example number of hidden layers, number of neurons per layer, an activation function or particular neurons and / or weights of neurons.
[0043] Figure 1 illustrates an example architecture of an analytics consumer subscribing to analytics from a network function. In the example architecture, the analytics consumer is a cNF and the network function is a NWDAF. The interfaces towards the NWDAF and cNF are shown as Nnwdaf, and Ncnf, respectively. The NWDAF may comprise an AnLF, a Model Training Logical Function (MTLF) and a Model Complexity Management Logical Function (MCMLF).
[0044] The MCMLF may be used to manage and choose among different alternative models based on the requested analytics and the available resources. Furthermore, the MCMLF may decide on whether a normal mode or a resource saving mode (as described below) should apply for delivering analytics. In some embodiments, the AnLF may perform the role of MCMLF as described herein. In such cases, the interfaces between the AnLF and the MCMLF may be unnecessary. However, the existing interfaces between the AnLF and the MTLF may remain.
[0045] Figure 2 is a flowchart illustrating a method 200 for providing analytics to an analytics consumer in a communications network, according to certain embodiments.
[0046] The method 200 of Figure 2 may be performed by a network function, for example a core network node, which may comprise a physical or virtual node, and may be implemented in a computing device or server apparatus and / or in a virtualized environment, for example in a cloud, edge cloud or fog deployment. In particular, the method 200 may be performed by an NWDAF, and the NWDAF may comprise one or more of an AnLF, an MTLF and an MCMLF (e.g. as illustrated in Figure 1). The network function may be in communication with the analytics consumer. The analytics consumer may comprise a consumer network function (cNF) or some other consumer of the analytics provided by the NWDAF, for example a Policy Control Function (PCF), Network Exposure Function (NEF) or Management Data Analytics Function (CAM). The analytics consumer may comprise an analytics consumer performing the method 300 of Figure 3.
[0047] In step 210, the network function receives, from the analytics consumer, a request for the analytics. It will be appreciated that the analytics may comprise predictions or statistics provided by the NWDAF, for example, the analytics may be produced by ML models run by the NWDAF.
[0048] In step 220, the network function evaluates whether available resources at the network function meets a condition. In some embodiments, the condition comprises a threshold condition. In other words, the network function may evaluate whether the resources utilised at the network function allow for continued operation in a first mode of operation (e.g. a normal mode of operation).
[0049] If the available resources at the network function meets the condition, the method 200 proceeds to step 230. If the available resources at the network function do not meet the condition, the method 200 proceeds to step 240. For example, the available resources may be expressed in terms of any one of more of: a percentage of CPU / GPU resources that are available, an amount of memory or cache available, an available energy budget and an available computational capacity in terms of Floating-Point Operations Per Second (FLOPS).
[0050] In step 230, the network function operates in a first mode of operation (e.g., a normal mode of operation) by using a first ML model to obtain the analytics, wherein the first ML model requires a first amount of resources to perform inference. In some embodiments, the first amount of resources comprises one or more of: an amount of floating operation points, a percentage of CPU / GPU usage, and an amount of memory or cache.
[0051] In step 240, the network function operates in a second mode of operation (e.g. a resource saving mode of operation) by using a second ML model to obtain the analytics, wherein the second ML model requires a second amount of resources to perform inference, wherein the second amount of resources is lower than the first amount of resources. In some embodiments, the second amount of resources comprises one or more of an amount of floating operation points, a percentage of CPU / GPU usage and an amount of memory or cache. In other words, by performing the method of Figure 2 the network function may choose to operate in either a first mode of operation or a second mode of operation dependent on whether the available resources at the network function meet the condition. The first and second modes of operation utilise differing ML models that require different amounts of resources. So, if resource availability at the network function is low (e.g. does not meet the condition), the network function can opt to use the second mode of operation in order to save resources.
[0052] In some embodiments, the method 200 further comprises the network function selecting the first ML model and / or the second ML model. The network function selecting the first ML model and / or the second ML model may comprise the network function obtaining information associated with available ML models at the network function (e.g. ML models that the network function can utilise) capable of providing the analytics (e.g. those ML models that provide the correct output). The information associated with the available ML models may be obtained so that the network function can select appropriate available ML models as the first ML model and / or the second ML model.
[0053] Information for an available ML model may comprise one or more of: an indication of an amount of resources required by the available ML model to perform inference, an indication of latency of the available ML model in performing inference and an indication of accuracy of the available ML model in performing inference. This information may be stored for each of the available ML models at the network function.
[0054] In some embodiments, the method 200 further comprises the network function storing the information associated with the available ML models in an MLTF of the network function. In some embodiments, for an available ML model, the network function updates the information responsive to utilising the available ML model to perform inference. In other words, on performing inference with a particular available ML model, the AnLF may record for example the latency of the ML model and may then indicate this to the MLTF which may update the stored information if appropriate.
[0055] In some examples therefore, the network function may select a first ML model and a second ML model based on the stored indications of the amount of resources the respective models require to perform inference. In some embodiments of step 210 of the method 200, the request for analytics further comprises a preferred accuracy level. The network function selecting the first ML model may then comprise the network function selecting the first ML model as one that provides the preferred accuracy level. In some embodiments of step 210 of the method 200, the request for analytics further comprises a preferred minimum accuracy level. The preferred minimum accuracy level is lower than the preferred accuracy level. The network function selecting the second ML model may then comprise the network function selecting the second ML model as one that provides the preferred minimum accuracy level.
[0056] The preferred accuracy level may comprise the current Preferred Accuracy Parameter [e.g. as outlined in TS 23.288 version 18.5.0], The preferred accuracy level may be considered an accuracy level that the analytics consumer is requesting during a normal operational mode of the AnLF (e.g. the first mode of operation). The preferred minimum accuracy level parameter as described above is an accuracy level that, although not preferred, may be accepted as a fallback, e.g. during the second mode of operation (e.g. resource saving mode of operationO according to embodiments described herein. Following this approach, it may be considered mandatory to include the preferred accuracy parameter, the minimum accuracy parameter may be considered optional. Consequently, it will be appreciated that the existing preferred accuracy parameter may require greater accuracy than the preferred minimum accuracy level parameter described herein.
[0057] Additionally, in some embodiments of step 210 of the method 200, the request for analytics comprises an indication of a preferred level of latency. The network function selecting the first ML model may then comprise the network function selecting the first ML model as one that performs inference in less than or equal to the preferred level of latency.
[0058] In some examples, the network function selecting the second ML model may then comprise the network function selecting the second ML model as one that performs inference in less than or equal to the preferred level of latency. In this example, the second mode of operation makes no concession with regards to latency, and both the first ML model and the second ML model are required to provide at least the preferred level of latency. In other examples, the request for analytics of step 210 may further comprise a preferred maximum latency level. The preferred maximum latency level is greater than the preferred level of latency. The network function selecting the second ML model may then comprise the network function selecting the second ML model as one that performs inference in less than or equal to the preferred maximum level of latency. In this example therefore the resource saving mode makes a concession as to the allowable latency as well as the required resources, and the latency provided by the second ML model is allowed to be greater than the latency provided by the first ML model.
[0059] In some embodiments, the network function selects the first ML model for the first mode of operation responsive to receiving the request for analytics from the analytics consumer.
[0060] In some embodiments, the network function selects the second ML model for the second mode of operation responsive to receiving the request for analytics from the analytics consumer. In this example both the first ML model and the second ML model are selected concurrently in preparation for any reduction in available resources. This enables the network function to deploy the second ML model quickly, should the available resources at the network function fail to meet the condition.
[0061] In other embodiments, the network function selects the second ML model for the second mode of operation responsive to the available resources at the network function not meeting the condition. In some embodiments, the network function selects the second ML model to ensure that the second amount of resources is less than or equal to the available resources. In this example, the second ML model is selected appropriately based on the amount of resources available.
[0062] In some embodiments, the request for analytics further comprises an indication of a duration for which the network function is allowed to provide analytics to the analytics consumer utilising the second mode of operation. In some embodiments, the request for analytics further comprises an indication of a minimum duration between instances of utilising the second mode of operation. These parameters (e.g. the duration for which the network function is allowed to use the second mode of operation, and the minimum duration between instances of utilising the second mode of operation) limit the use of the second mode of operation with a particular analytics consumer. This may provide a level of protection for an analytics consumer to ensure that a certain level of acceptable service is provided.
[0063] It will be appreciated that is a network function is serving a plurality of analytics consumers concurrently it may perform load balancing in such a way as to ensure that the resources are fairly distributed amongst the plurality of analytics consumers, and the parameters of the duration for which the network function is allowed to use the second mode of operation, and the minimum duration between instances of utilising the second mode of operation may provide a mechanism for the network function to ensure that the resources are being fairly distributed.
[0064] In some embodiments, the request for analytics further comprises an indication of a duration for which one or more parameters indicated in the request for analytics are valid. For example, the duration may indicate that the preferred level of latency indicated in the request is valid for example during day-time hours. There may be another indication of another preferred level of latency that is valid during nighttime hours. For example, the analytics consumer may require the ML model to operate with lower latency during daytime hours due to higher expected volumes of traffic via the relevant applications utilising the analytics.
[0065] The indication of the duration for which one or more parameters indicated in the request for analytics are valid therefore enables the NWDAF to be more responsive to the dynamic (time-dependent) demand of the various requirements of cNF (e.g. accuracy, latency, etc).
[0066] After the network function has obtained the analytics in step 230 or 240, the method 200 proceeds to step 250.
[0067] In step 250, the network function transmits, to the analytics consumer, the analytics.
[0068] In some embodiments, the method 200 further comprises the network function transmitting, to the analytics consumer, an indication of whether the network function is operating in the first mode of operation or the second mode of operation to obtain the analytics. This indication of whether the network function is operating in a first mode of operation or a second mode of operation may be used by the analytics consumer to adjust its own operation. For example, a reliability or weighting applied by the analytics consumer to the analytics may be lowered if the second mode of operation is used, as the analytics may be considered less reliable.
[0069] The network function may also indicate the parameters (or a representation of) of the ML model used to produce the analytics to the analytics consumer. For example, the method 200 may comprise, responsive to operating in the first mode of operation transmitting, to the analytics consumer, a notification comprising model parameters of the first ML model (e.g. comprising a representation of the first model). In other embodiments, the method 200 comprises, responsive to operating in the second mode of operation transmitting, to the analytics consumer, a notification comprising model parameters of the second ML model.
[0070] Figure 3 is a flowchart illustrating a method 300 for receiving analytics to from a network function in a communications network, according to certain embodiments.
[0071] The method 300 may be performed by an analytics consumer, which may comprise a physical or virtual node, and may be implemented in a computing device or server apparatus and / or in a virtualized environment, for example in a cloud, edge cloud or fog deployment. In particular, the method 300 may be performed by a cNF. The analytics consumer may be in communication with a network function. The network function may comprise a network function performing the method 200 of Figure 2.
[0072] In step 310, the analytics consumer transmits, to the network function, a request for the analytics. Step 310 corresponds to step 210 of Figure 2.
[0073] In step 320, the analytics consumer receives, from the network function, the analytics. Step 320 corresponds to step 250 of Figure 2.
[0074] In step 330, the analytics consumer receives, from the network function, an indication to the analytics consumer of whether the network function is operating in the first mode of operation or the second mode of operation to obtain the analytics. As described above, the analytics consumer may then utilise the indication of whether the network function is operating in the first mode of operation or the second mode of operation to adjust its own operation. For example, a reliability or weighting applied by the analytics consumer to the analytics may be lowered if the second mode of operation is used, as the analytics may be considered less reliable.
[0075] The methods of Figures 2 and 3 enable a new operation mode for prediction delivery, the “resource saving mode”, where the NWDAF chooses a new ML model to run. These methods may also allow provide for a new information element in Nnwdaf_AnalyticsSubscription Notify message, e.g. an analytics delivery mode information element, and a new information element in Nnwdaf_AnalyticsSubscription Request, a complexity management container information element. The analytics delivery mode parameter allows the NWDAF to specify whether the analytics are generated in resource saving mode or in normal mode. This may be complementary to the existing “Analytics energy consumption” proposed in [Solution #12, Clause 6.12.2.3, TR 23.700-66], While “Analytics energy consumption” is a metric (field) for new analytics indicating the power used by an operation (likely measured in watts), “Analytics Delivery Mode” information element may be a binary flag that simply designates the mode (e.g. normal mode or resource saving mode) in which the requested analytics are generated.
[0076] Figure 4 is a signalling diagram illustrating an example implementation of method 200 of Figure 2 and method 300 of Figure 3. In this example, the method 200 is performed by an NWDAF 402 and the method 300 is performed by a cNF 401. In some embodiments, the NWDAF 402 comprises an AnLF.
[0077] The example implementation begins at step 410. Step 410 may be considered an example implementation of step 210 of Figure 2 and step 310 of Figure 3. In step 410, the cNF 401 transmits, to the NWDAF 402, a request for analytics from the NWDAF. The cNF may include in the request for the analytics an analytics ID and, as an optional input, the complexity management container, which may comprise one or more of the following parameters:
[0078] - a preferred minimum accuracy level. This parameter indicates a minimum level of accuracy required by the cNF. The level may be used as a design guideline in the resource saving mode. The cNF may still communicate its preferred level of accuracy (for the normal mode). The preferred minimum accuracy level parameter can either be a single number or a vector corresponding to each time interval indicated by a Time Slot Vector. - a Time Slot Vector. This parameter is a vector indicating the start and end of each time interval.
[0079] - a maximum acceptable resource saving time. This parameter indicates a a duration for which the network function is allowed to provide analytics to the analytics consumer utilising the resource saving mode of operation. There may be a correlation between this duration and the intervals (potentially) reported in the Time Slot Vector.
[0080] - a guard time. The guard time may comprise an indication of a minimum duration between instances of utilising the resource saving mode of operation.
[0081] In step 420, the NWDAF 402 selects ML models for normal and resource saving modes. The NWDAF 402 selects a first ML model for normal operation mode and may also select a second ML model for the resource saving mode. In the resource saving mode, the predictions provided by the NWDAF 402 may have a lower accuracy level than the accuracy level in the normal mode due to resource shortage. As described above with reference to Figure 2, to select the right model for a mode of operation, the ML model information may be enhanced by ML model resource requirement information, which may include parameters such as the model complexity in terms of FLOPS and / or hardware requirements. The hardware requirements may include, for example, the percentage of CPU / GPU usage, an amount of memory and an amount of cache.
[0082] In step 430, the NWDAF 402 transmits, to the cNF 401 , a notification (e.g. with an analytics delivery mode parameter) that the NWDAF 402 is operating in the normal mode of operation. Step 430 is an example implementation of step 330 of Figure 3. For example, the NWDAF 402 may notify the cNF 401 about the availability of the requested analytics as per the current specification and uses the ML model parameters for the normal operating mode.
[0083] In step 440, analytics flow between the NWDAF 402, operating in the normal mode, and the cNF 401. Step 440 is an example implementation of step 250 of Figure 2 and step 320 of Figure 3. In step 450, the NWDAF 402 starts operating in the resource saving mode. Step 450 is an example implementation of steps 240 of Figure 2. The NWDAF 402 operating in the resource saving mode may happen for example due to the shortage of resources in the AnLF.
[0084] In step 460, the NWDAF 402 transmits, to the cNF 401 , a notification (e.g. the analytics delivery mode parameter) that the NWDAF 402 is operating in the resource saving mode of operation. Step 460 is an example implementation of step 330 of Figure 3. The NWDAF 402 may also notify the cNF 401 about ML model parameters for resource saving operating mode.
[0085] In step 470, analytics flow between the NWDAF 402, operating in the resource saving mode, and the cNF 401. Step 470 is an example implementation of step 250 of Figure 2 and step 320 of Figure 3.
[0086] Figure 5 is a flowchart illustrating a method 500 for providing analytics to an analytics consumer in a communications network, according to certain embodiments.
[0087] The method 500 of Figure 5 may be performed by a network function, for example a core network node, which may comprise a physical or virtual node, and may be implemented in a computing device or server apparatus and / or in a virtualized environment, for example in a cloud, edge cloud or fog deployment. The network function may be in communication with the analytics consumer. The analytics consumer may comprise a cNF or any other consumer of analytics. The analytics consumer may comprise an analytics consumer performing the method 600 of Figure 6.
[0088] In step 510, the network function receives, from the analytics consumer, a request for the analytics. The request for the analytics comprises an indication of a preferred level of latency. In some embodiments, the request for the analytics further comprises an identification of the requested analytics.
[0089] In step 520, the network function selects a first ML model that performs inference with less than or equal to the preferred level of latency. In some embodiments, the network function selecting the first ML model comprises the network function obtaining information associated with available ML models at the network function capable of providing the analytics.
[0090] Therefore, by providing the preferred level of latency parameter, the analytics consumer is able to specify an allowable level of latency, therefore affording the analytics consumer control over the latency provided by the ML model selected by the network function.
[0091] The information for an available ML model may comprise one or more of: an indication of latency of the available ML model in performing inference and an indication of accuracy of the available ML model in performing inference. In some embodiments, step 520 of the method 500 further comprises the network function storing the information associated with the available ML models in an MLTF of the network function. In some embodiments, for an available ML model, the network function updates the information responsive to utilising the available ML model to perform inference.
[0092] In step 530, the network function obtains, using the first ML model, the analytics.
[0093] In step 540, the network function transmits, to the analytics consumer, the analytics.
[0094] In some embodiments, the method 500 further comprises the network function transmitting, to the analytics consumer, a notification comprising model parameters of the first ML model.
[0095] Figure 6 is a flowchart illustrating a method 600 for receiving analytics from a network function in a communications network, according to certain embodiments.
[0096] The method 600 may be performed by an analytics consumer, which may comprise a physical or virtual node, and may be implemented in a computing device or server apparatus and / or in a virtualized environment, for example in a cloud, edge cloud or fog deployment. In particular, the method 600 may be performed by a cNF or any other consumer of analytics. The analytics consumer may be in communication with a network function. The network function may comprise a network function performing the method 500 of Figure 5. In step 610, the analytics consumer transmits, to the network function, a request for the analytics, wherein the request for the analytics comprises an indication of a preferred level of latency. Step 610 corresponds to step 510 of Figure 5.
[0097] In step 620, the analytics consumer receives, from the network function, the analytics, wherein the analytics are obtained from a first Machine Learning, ML, model that performs inference with less than or equal to the preferred level of latency. Step 620 corresponds to step 540 of Figure 5.
[0098] Figure 7 is a signalling diagram illustrating an example implementation of method 500 of Figure 5 and method 600 of Figure 6. In this example, the method 500 is performed by an NWDAF 702 and the method 600 is performed by a cNF 701. In some embodiments, the NWDAF 702 comprises an AnLF.
[0099] The example implementation begins at step 710, which is an example implementation of step 510 of Figure 5 and step 610 of Figure 6. In step 710, the cNF 701 transmits, to the NWDAF 702, a request for analytics from the NWDAF. The request for analytics comprises an indication of a preferred level of latency. By including the preferred level of latency, the NWDAF 702 may select a better ML model to provide the analytics. For example, with the preferred level of latency set to “Low” the NWDAF 702 may select a less accurate but faster model.
[0100] The cNF 701 may further include in the request for the analytics an analytics ID and, as an optional input, the complexity management container, which may comprise one or more of the following parameters described above a Time Slot Vector, a maximum acceptable resource saving time and a guard time.
[0101] In step 720, the NWDAF 702 obtains the analytics using a first ML model. The first ML model is selected according to the preferred level of latency received in the request for analytics from the cNF 701. The NWDAF 702 selecting the first ML model according to the preferred level of latency may comprise the NWDAF 702 selecting, as the first ML model, an ML model that performs inference with less than or equal to the preferred level of latency. Step 720 is an example implementation of steps 520 and 530 of Figure 5. To select the right model, the ML model information may be enhanced by ML model latency, which is the time taken to generate analytics from the input data. The ML model information may also be enhanced by ML model resource requirement information, which may include one or more of the following parameters model complexity in terms of FLOPS and hardware requirements such as, for example, the percentage of CPU / GPU usage, an amount of memory and an amount of cache.
[0102] In step 730, analytics flow between the NWDAF 702 and the cNF 701. Step 730 is an example implementation of steps 540 of Figure 5 and step 620 of Figure 6.
[0103] It will be appreciated that in some cases, the method of Figures 2, 3, 5 and 6 may be combined.
[0104] Figure 8 is a signalling diagram illustrating an example implementation of a combination of the methods of Figures 2, 3, 5 and 6. In this example, methods 200 and 500 are performed by an NWDAF 802 and methods 300 and 600 are performed by a cNF 801. In some embodiments, the NWDAF 802 comprises an AnLF. The example implementation begins at step 810, which is a repeat of step 710 in Figure 7.
[0105] In step 820, the NWDAF 802 selects ML models for normal and resource saving modes. The NWDAF 802 selects a first ML model for normal operation mode and may also select a second ML model for the resource saving mode. To select the right model for each mode of operation, the ML model information may be enhanced by ML model resource requirement information, which may include parameters such as the model complexity in terms of FLOPS and / or hardware requirements. The hardware requirements may include, for example, the percentage of CPU / GPU usage, an amount of memory and an amount of cache. Additionally, the NWDAF 802 may select at least the first ML model that performs inference with less than or equal to the preferred level of latency received in the request for analytics from the cNF 801. Hence, for the normal and / or resource saving mode, the NWDAF 802 may select the right ML model for a mode according to the ML model resource requirement information and according to the preferred level of latency. As mentioned above with reference to figure 3, the request of Figure 810 may also comprise a maximum preferred level of latency which may be utilized to select the second ML model.
[0106] Steps 830 to 870 are repeats of steps 430 to 470 in Figure 4, respectively. It will be appreciated that, as described in Figure 1 , the NWDAF may comprise internal functions. The internal functions may include an AnLF, an MTLF and an MCMLF. ML models may be trained by the MTLF and implemented by the AnLF. The MTLF may additionally store an extended set of metadata (e.g. the mode information referred to above) associated to each model in the normal mode and resource saving mode. The extended set of metadata may comprise latency and resource requirements. Metadata information may be available in Nnwdaf_MLModellnfo_Response and Nnwdaf_MLModelProvision_Notify. The MCMLF may manage via the MTLF, and choose among different alternative models, including resource allocation settings and deployment environment, based on the requested analytics.
[0107] Between the internal functions of the NWDAF, there may be signalling for the interaction between the AnLF and the MCMLF for requesting or subscribing / notifying to decisions about which models must be used in each working mode, normal or resource saving. There may also be signalling for the interaction between the AnLF and the MTLF so that the MTLF may update the model metadata with information generated at inference time and supplied by the AnLF. The information may include, for example, average latency and actual resource consumption.
[0108] Figure 9 is a signalling diagram illustrating an example implementation of signalling between different internal functions of the NWDAF 901 to select an ML model for the normal mode and the resource saving mode. The internal functions of the NWDAF 901 comprise an AnLF 902, an MCMLF 903 and an MTLF 904. The signalling illustrated in Figure 9 may be applied to one or more of the methods 200 and 300 and the methods 500 and 600. For example, step 450 and 850 may be performed utilising the signalling illustrated in Figure 9.
[0109] The example implementation begins at step 910, in which the AnLF 902 transfers relevant parameters in the request for analytics from the analytics consumer, and any other relevant information about the AnLF 902, to the MCMLF 903. Foor example, AnLF 902 may transfer the Analytics ID included in the request for analytics from the analytics consumer. For an NWDAF as the network function of methods 200 and 300, the AnLF 902 may transfer the one or more parameters included in the complexity management container. For an NWDAF as the network function of methods 500 and 600, the AnLF 902 may transfer the preferred level of latency. In some examples, the AnLF 902 may transfer the one or more parameters included in the complexity management container and the preferred level of latency.
[0110] In step 920, the MCMLF 903 collects the address and metadata of the available models in the MTLF 904 for the Analytics ID included in the request for analytics.
[0111] In step 930, the MTLF 904 provides the requested information. The metadata may contain parameters that help the MCMLF 903 to make a decision, including model latency and / or resource requirement information.
[0112] In step 940, in addition to the retrieval of existing metadata, the MCMLF 903 subscribes to changes in the metadata or to the availability of new models.
[0113] In step 950, the MCMLF 903 considers the request from the analytics consumer to select a model for normal mode operation. In some examples, the MCMLF 903 may select one or more models to run during the resource saving mode.
[0114] In step 960, the MCMLF 903 provides the information to the AnLF 902. Depending on the implementation, an implicit subscription procedure may be created so that the MCMLF 903 notifies the AnLF 902 if the situation changes, and a new model must be selected instead. This implicit subscription may be handled as a separate request. The implicit approach may be preferred for simplicity. The information provided to the AnLF 902 may include the URL of the model artifact.
[0115] In step 970, the AnLF 902 collects the selected model artifacts. Model artifacts are expected to be stored in an ML Model Repository, which may be part of an Analytics Data Repository Function (ADRF) (see for example TS 23.288 version 18.5.0.
[0116] Figure 10 is a signalling diagram illustrating an example implementation of signalling to keep metadata for an ML model that is stored in the MTLF 1004 up to date. The MTLF 1004 is an internal function in the NWDAF 1001. The NWDAF 1001 may also comprise an AnLF 1002 and an MCMLF 1003. Although the MTLF 1004 may already provide qualitative information about the model complexity or underlying technology (for example, based on neural networks or on ensemble methods), the MTLF 1004 may lack information about the model inference. The AnLF 1002 may notify the MTLF 1004 about the average latency or resource consumption of specific models of the ML model in the inference mode. This operation enables the MTLF 1004 to update the metadata.
[0117] The example implementation begins at step 1010, in which the AnLF 1002 provides information about the ML model performance including latency and / or resource consumption information to the MTLF 1004.
[0118] In step 1020, the MTLF 1004 uses this information to update the model meta data.
[0119] In step 1030, the MTLF 1004 may notify the MCMLF 1003 about the updated metadata.
[0120] The example implementations above may allow for the network function to switch between operating on normal mode and operating on resource saving mode, as well as adding a preferred level of latency to the analytics consumer’s request for obtaining analytics. The network function is enabled to determine the ML model to use to provide analytics to an analytics consumer that is optimal for the amount of resources available to the network function and / or requested preferred level of latency.
[0121] Figure 11 shows a network node 1100 in accordance with some embodiments. As used herein, network node refers to equipment capable, configured, arranged and / or operable to communicate directly or indirectly with a UE and / or with other network nodes or equipment, in a telecommunication network. The network node 1100 may be operable as a core network node, a core network function or, more generally, a core network entity, such as the core network node QQ108 described above with respect to Figure QQ1). Examples of network nodes in this context include core network entities such as one or more of: a Network Data Analytics Function (NWDAF), any consumer NF, a Mobile Switching Center (MSC), Mobility Management Entity (MME), Home Subscriber Server (HSS), Access and Mobility Management Function (AMF), Session Management Function (SMF), Authentication Server Function (AUSF), Subscription Identifier De-concealing function (SIDF), Unified Data Management (UDM), Security Edge Protection Proxy (SEPP), Network Exposure Function (NEF), Policy Control Function (PCF) and / or a User Plane Function (UPF).
[0122] The network node 1100 includes processing circuitry 1102, a memory 1104, a communication interface 1106, and a power source 1108, and / or any other component, or any combination thereof. The network node 1100 may be composed of multiple physically separate components, which may each have their own respective components. In certain scenarios in which the network node 1100 comprises multiple separate components, one or more of the separate components may be shared among several network nodes.
[0123] The processing circuitry 1102 may comprise a combination of one or more of a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application-specific integrated circuit, field programmable gate array, or any other suitable computing device, resource, or combination of hardware, software and / or encoded logic operable to provide, either alone or in conjunction with other network node 1100 components, such as the memory 1104, network node 1100 functionality. For example, the processing circuitry 1102 may be configured to cause the network node to perform the methods as described with reference to Figures 2, 3, 5 or 6.
[0124] The memory 1104 may comprise any form of volatile or non-volatile computer-readable memory including, without limitation, persistent storage, solid-state memory, remotely mounted memory, magnetic media, optical media, random access memory (RAM), read-only memory (ROM), mass storage media (for example, a hard disk), removable storage media (for example, a flash drive, a Compact Disk (CD) or a Digital Video Disk (DVD)), and / or any other volatile or non-volatile, non-transitory device-readable and / or computer-executable memory devices that store information, data, and / or instructions that may be used by the processing circuitry 1102. The memory 1104 may store any suitable instructions, data, or information, including a computer program, software, an application including one or more of logic, rules, code, tables, and / or other instructions capable of being executed by the processing circuitry 1102 and utilized by the network node 1100. The memory 1104 may be used to store any calculations made by the processing circuitry 1102 and / or any data received via the communication interface 1106. In some embodiments, the processing circuitry 1102 and memory 1104 is integrated.
[0125] The communication interface 1106 is used in wired or wireless communication of signaling and / or data between a network node, access network, and / or UE.
[0126] The power source 1108 provides power to the various components of network node
[0127] 1100 in a form suitable for the respective components (e.g., at a voltage and current level needed for each respective component). The power source 1108 may further comprise, or be coupled to, power management circuitry to supply the components of the network node 1100 with power for performing the functionality described herein. For example, the network node 1100 may be connectable to an external power source (e.g., the power grid, an electricity outlet) via an input circuitry or interface such as an electrical cable, whereby the external power source supplies power to power circuitry of the power source 1108. As a further example, the power source 1108 may comprise a source of power in the form of a battery or battery pack which is connected to, or integrated in, power circuitry. The battery may provide backup power should the external power source fail.
[0128] Embodiments of the network node 1100 may include additional components beyond those shown in Figure 11 for providing certain aspects of the network node’s functionality, including any of the functionality described herein and / or any functionality necessary to support the subject matter described herein. For example, the network node 1100 may include user interface equipment to allow input of information into the network node 1100 and to allow output of information from the network node 1100. This may allow a user to perform diagnostic, maintenance, repair, and other administrative functions for the network node 1100.
[0129] Figure 12 is a block diagram illustrating a virtualization environment 1200 in which functions implemented by some embodiments may be virtualized. In the present context, virtualizing means creating virtual versions of apparatuses or devices which may include virtualizing hardware platforms, storage devices and networking resources. As used herein, virtualization can be applied to any device described herein, or components thereof, and relates to an implementation in which at least a portion of the functionality is implemented as one or more virtual components. Some or all of the functions described herein may be implemented as virtual components executed by one or more virtual machines (VMs) implemented in one or more virtual environments 1200 hosted by one or more of hardware nodes, such as a hardware computing device that operates as a network node, UE, core network node, or host. Further, in embodiments in which the virtual node does not require radio connectivity (e.g., a core network node or host), then the node may be entirely virtualized. In some embodiments, the virtualization environment 1200 includes components defined by the O-RAN Alliance, such as an O-Cloud environment orchestrated by a Service Management and Orchestration Framework via an 0-2 interface. Virtualization may facilitate distributed implementations of a network node, UE, core network node, or host.
[0130] Applications 1202 (which may alternatively be called software instances, virtual appliances, network functions, virtual nodes, virtual network functions, etc.) are run in the virtualization environment Q400 to implement some of the features, functions, and / or benefits of some of the embodiments disclosed herein.
[0131] Hardware 1204 includes processing circuitry, memory that stores software and / or instructions executable by hardware processing circuitry, and / or other hardware devices as described herein, such as a network interface, input / output interface, and so forth. Software may be executed by the processing circuitry to instantiate one or more virtualization layers 1206 (also referred to as hypervisors or virtual machine monitors (VMMs)), provide VMs 1208a and 1208b (one or more of which may be generally referred to as VMs 1208), and / or perform any of the functions, features and / or benefits described in relation with some embodiments described herein. The virtualization layer 1206 may present a virtual operating platform that appears like networking hardware to the VMs 1208.
[0132] The VMs 1208 comprise virtual processing, virtual memory, virtual networking or interface and virtual storage, and may be run by a corresponding virtualization layer 1206. Different embodiments of the instance of a virtual appliance 1202 may be implemented on one or more of VMs 1208, and the implementations may be made in different ways. Virtualization of the hardware is in some contexts referred to as network function virtualization (NFV). NFV may be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which can be located in data centers, and customer premise equipment.
[0133] In the context of NFV, a VM 1208 may be a software implementation of a physical machine that runs programs as if they were executing on a physical, non-virtualized machine. Each of the VMs 1208, and that part of hardware 1204 that executes that VM, be it hardware dedicated to that VM and / or hardware shared by that VM with others of the VMs, forms separate virtual network elements. Still in the context of NFV, a virtual network function is responsible for handling specific network functions that run in one or more VMs 1208 on top of the hardware 1204 and corresponds to the application 1202.
[0134] Hardware 1204 may be implemented in a standalone network node with generic or specific components. Hardware 1204 may implement some functions via virtualization. Alternatively, hardware 1204 may be part of a larger cluster of hardware (e.g. such as in a data center or CPE) where many hardware nodes work together and are managed via management and orchestration 1210, which, among others, oversees lifecycle management of applications 1202. In some embodiments, hardware 1204 is coupled to one or more radio units that each include one or more transmitters and one or more receivers that may be coupled to one or more antennas. Radio units may communicate directly with other hardware nodes via one or more appropriate network interfaces and may be used in combination with the virtual components to provide a virtual node with radio capabilities, such as a radio access node or a base station. In some embodiments, some signaling can be provided with the use of a control system 1212 which may alternatively be used for communication between hardware nodes and radio units.
[0135] Although the computing devices described herein (e.g., UEs, network nodes) may include the illustrated combination of hardware components, other embodiments may comprise computing devices with different combinations of components. It is to be understood that these computing devices may comprise any suitable combination of hardware and / or software needed to perform the tasks, features, functions and methods disclosed herein. Determining, calculating, obtaining or similar operations described herein may be performed by processing circuitry, which may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the network node, and / or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination. Moreover, while components are depicted as single boxes located within a larger box, or nested within multiple boxes, in practice, computing devices may comprise multiple different physical components that make up a single illustrated component, and functionality may be partitioned between separate components. For example, a communication interface may be configured to include any of the components described herein, and / or the functionality of the components may be partitioned between the processing circuitry and the communication interface. In another example, non-computationally intensive functions of any of such components may be implemented in software or firmware and computationally intensive functions may be implemented in hardware.
[0136] In certain embodiments, some or all of the functionality described herein may be provided by processing circuitry executing instructions stored on in memory, which in certain embodiments may be a computer program product in the form of a non- transitory computer-readable storage medium. In alternative embodiments, some or all of the functionality may be provided by the processing circuitry without executing instructions stored on a separate or discrete device-readable storage medium, such as in a hard-wired manner. In any of those particular embodiments, whether executing instructions stored on a non-transitory computer-readable storage medium or not, the processing circuitry can be configured to perform the described functionality. The benefits provided by such functionality are not limited to the processing circuitry alone or to other components of the computing device but are enjoyed by the computing device as a whole, and / or by end users and a wireless network generally.
[0137] Embodiments described herein therefore enable avoiding excessive and unnecessary compute and energy consumption within NWDAF containing AnLF that runs ML models during inference phase. Embodiments described herein also enable better resource management within NWDAF containing AnLF. The embodiments described herein allow for better scalability of NWDAF containing AnLF such that it is capable serving more parallel NFs with the same NWDAF.
[0138] It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.
Claims
CLAIMS1. A method, performed by a network function in a communications network, for providing analytics to an analytics consumer in the communications network, the method comprising: receiving (201), from the analytics consumer, a request for the analytics; responsive to available resources at the network function meeting a condition: operating (230) in a first mode of operation by using a first ML model to obtain the analytics, wherein the first ML model requires a first amount of resources to perform inference; and responsive to available resources at the network function not meeting the condition: operating (240) in a second mode of operation by using a second ML model to obtain the analytics, wherein the second ML model requires a second amount of resources to perform inference, wherein the second amount of resources is lower than the first amount of resources; and transmitting (250) the analytics to the analytics consumer.
2. The method of any one of claims 1 , wherein the condition comprises a threshold condition.
3. The method as claimed in any one of claims 1 or 2 further comprising transmitting (330) an indication to the analytics consumer of whether the network function is operating in the first mode of operation or the second mode of operation to obtain the analytics.
4. The method of any one of claim 1 to 3 further comprising selecting (420, 820) the first ML model and / or the second ML model.
5. The method of claim 4 further comprising selecting the first ML model for the first mode of operation responsive to receiving the request for analytics from the analytics consumer.
6. The method of claim 4 or 5, further comprising selecting the second ML model for the second mode of operation responsive to receiving the request for analytics from the analytics consumer.
7. The method of claim 4 or 5, further comprising selecting the second ML model for the second mode of operation responsive to the available resources at the network function not meeting the condition.
8. The method of claim 7 further comprising selecting the second ML model to ensure that the second amount of resources is less than or equal to the available resources.
9. The method as claimed in any one of claims 4 to 8 wherein selecting the first ML model and / or the second ML model comprises: obtaining information associated with available ML models at the network function capable of providing the analytics, wherein stored information for an available ML model comprises one or more of: an indication of an amount of resources required by the available ML model to perform inference; an indication of latency of the available ML model in performing inference; and an indication of accuracy of the available ML model in performing inference.
10. The method as claimed in claim 9 further comprising storing the information associated with the available ML models in a model training logical function (904), MLTF, of the network function.11 . The method as claimed in claim 10 further comprising for an available ML model, updating (1010) the information responsive to utilising the available ML model to perform inference.
12. The method of any one of claims 4 to 11 , wherein the request for the analytics comprises an indication of a preferred level of latency (810); and selecting (820) the first ML model comprises selecting the first ML model as one that performs inference in less than or equal to the preferred level of latency.
13. The method of claim 12, wherein selecting the second ML model comprises selecting the second ML model as one that performs inference in less than or equal to the preferred level of latency.
14. The method of claim 12 wherein the request for analytics further comprises a preferred maximum latency level; and wherein the method further comprises: selecting the second ML model as one that performs inference in less than or equal to the preferred maximum latency level, wherein the preferred maximum latency level is greater than the preferred level of latency.
15. The method of claim 4 to 14 wherein the request for analytics further comprises a preferred accuracy level; and selecting the first ML model comprises selecting the first ML model as one than provides the preferred accuracy level.
16. The method of claim 15 wherein the request for analytics further comprises a preferred minimum accuracy level; and selecting the second ML model comprises selecting the second ML model as one than provides the preferred minimum accuracy level, wherein the preferred minimum accuracy level is lower than the preferred accuracy level.
17. The method of any one of claims 1 to 16 wherein the request for analytics further comprises an indication of a duration for which the network function is allowed to provide analytics to the analytics consumer utilising the second mode of operation.
18. The method of any one of claims 1 to 17 wherein the request for analytics further comprises an indication of a minimum duration between instances of utilising the second mode of operation.
19. The method of any one of claims 1 to 18, wherein the request for analytics further comprises an indication of a duration for which one or more parameters indicated in the request for analytics are valid.
20. The method of any one of claims 1 to 19 wherein the first amount or second amount of resources comprises one or more of:an amount of floating operation points; an amount of CPU / GPU an amount of memory or cache.21 . The method of any one of claims 1 to 20, further comprising: responsive to operating in the first mode of operation transmitting, to the analytics consumer, a notification comprising model parameters of the first ML model; and responsive to operating in the second mode of operation transmitting, to the analytics consumer, a notification comprising model parameters of the second ML model.
22. A method, performed by a network function in a communications network, for providing analytics to an analytics consumer in the communications network, the method comprising: receiving (510), from the analytics consumer, a request for the analytics, wherein the request for the analytics comprises an indication of a preferred level of latency; selecting (520) a first Machine Learning, ML, model that performs inference with less than or equal to the preferred level of latency; obtaining (530), using the first ML model, the analytics; and transmitting (540), to the analytics consumer, the analytics.
23. The method of claim 22, wherein the request for the analytics further comprises an identification of the requested analytics.
24. The method of any one of claims 22 to 23, further comprising transmitting, to the analytics consumer, a notification comprising model parameters of the first ML model.
25. The method as claimed in any one of claims 22 to 24 wherein selecting the first ML model comprises: obtaining information associated with available ML models at the network function capable of providing the analytics, wherein the information for an available ML model comprises one or more of:an indication of an amount of resources required by the available ML model to perform inference; an indication of latency of the available ML model in performing inference; and an indication of accuracy of the available ML model in performing inference.
26. The method as claimed in claim 25 further comprising storing the information associated with the available ML models in a model training logical function, MLTF (904), of the network function.
27. The method as claimed in claim 26 further comprising for an available ML model, updating (1010) the information responsive to utilising the available ML model to perform inference.
28. A method performed by an analytics consumer in a communications network for receiving analytics from a network function, the method comprising: transmitting (310), to the network function, a request for the analytics; receiving (320), from the network function, the analytics; and receiving (330), from the network function, an indication to the analytics consumer of whether the network function is operating in the first mode of operation or the second mode of operation to obtain the analytics.
29. A method performed by an analytics consumer in a communications network for receiving analytics from a network function, the method comprising: transmitting (610), to the network function, a request for the analytics, wherein the request for the analytics comprises an indication of a preferred level of latency; receiving (620), from the network function, the analytics, wherein the analytics are obtained from a first Machine Learning, ML, model that performs inference with less than or equal to the preferred level of latency.
30. A network function (1100) in a communications network for providing analytics to an analytics consumer in the communications network, the network function comprising processing circuitry (1100) and a memory (1104), the memorycontaining instructions executable by the processing circuitry whereby the network function is operable to: receive (210), from the analytics consumer, a request for the analytics; responsive to available resources at the network function meeting a condition: operate (230) in a first mode of operation by using a first ML model to obtain the analytics, wherein the first ML model requires a first amount of resources to perform inference; responsive to available resources at the network function not meeting the condition: operate (240) in a second mode of operation by using a second ML model to obtain the analytics, wherein the second ML model requires a second amount of resources to perform inference, wherein the second amount of resources is lower than the first amount of resources; and transmit (250) the analytics to the analytics consumer.
31. The network function as claimed in claim 30 wherein the memory further contains instructions executable by the processing circuitry whereby the network function is operable to perform the method as claimed in any one of claims 2 to 21 .
32. A network function (1100) in a communications network for providing analytics to an analytics consumer in the communications network, the network function comprising processing circuitry (1102) and a memory (1104), the memory containing instructions executable by the processing circuitry whereby the network function is operable to: receive (510), from the analytics consumer, a request for the analytics, wherein the request for the analytics comprises an indication of a preferred level of latency; select (520) a first Machine Learning, ML, model that performs inference with less than or equal to the preferred level of latency; obtain (530), using the first ML model, the analytics; and transmit (540), to the analytics consumer, the analytics.
33. The network function as claimed in claim 32 wherein the memory further contains instructions executable by the processing circuitry whereby the network function is operable to perform the method as claimed in any one of claims 23 to 27.
34. An analytics consumer (1100) in a communications network for receiving analyticsfrom a network function, the analytics consumer comprising processing circuitry (1102) and a memory (1104), the memory containing instructions executable by the processing circuitry whereby the analytics consumer is operable to: transmit (310), to the network function, a request for the analytics; receive (32), from the network function, the analytics; and receive (330), from the network function, an indication to the analytics consumer of whether the network function is operating in the first mode of operation or the second mode of operation to obtain the analytics.
35. An analytics consumer (1100) in a communications network for receiving analytics from a network function, the analytics consumer comprising processing circuitry (1102) and a memory (1104), the memory containing instructions executable by the processing circuitry whereby the analytics consumer is operable to: transmit (610), to the network function, a request for the analytics, wherein the request for the analytics comprises an indication of a preferred level of latency; receive (620), from the network function, the analytics, wherein the analytics are obtained from a first Machine Learning, ML, model that performs inference with less than or equal to the preferred level of latency.
36. A computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out a method according to any of claims 1 to 29.
37. A carrier containing the computer program according to claim 36, wherein the carrier comprises one of an electronic signal, optical signal, radio signal or computer readable storage medium.
38. A computer-readable medium comprising instructions that, when executed on at least one processor, cause the at least one processor to perform the method according to any of claims 1 to 29.
39. A computer program product comprising non transitory computer readable media having stored thereon a computer program according to claim 36.