Testing and measurement system for evaluating machine learning models, including one or more test devices, testing and measurement procedures, and computer program

The test and measurement system evaluates distributed learning models in 3GPP networks by simulating realistic conditions and user-defined specifications, addressing the need for quality assessment in mobile communication systems.

DE102025110461B3Undetermined Publication Date: 2026-07-02ROHDE & SCHWARZ GMBH & CO KG

Patent Information

Authority / Receiving Office
DE · DE
Patent Type
Patents
Current Assignee / Owner
ROHDE & SCHWARZ GMBH & CO KG
Filing Date
2025-03-18
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing technologies lack a standardized and efficient method to evaluate the quality of distributed machine learning models in mobile communication systems, particularly in 3GPP networks, which are crucial for optimizing network operations and ensuring data privacy and security.

Method used

A test and measurement system is developed to evaluate machine learning models by providing stimulus data to training devices, receiving learning model data, and assessing the quality of local and aggregated models, incorporating simulators for realistic conditions and user-defined specifications.

Benefits of technology

The system provides comprehensive evaluation of learning model quality, ensuring accuracy, generalizability, and robustness, supporting efficient integration into distributed systems while maintaining data privacy and security.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 00000000_0000_ABST
    Figure 00000000_0000_ABST
Patent Text Reader

Abstract

Proposed are a test and measurement system for evaluating learning models of one or more test devices, a test and measurement procedure, and a computer program. A test and measurement system (100) for evaluating learning models of one or more test devices (102; 104; 106; 108), comprising a test and measurement device (10), includes one or more interfaces (12) configured for data communication with the one or more test devices (102; 104; 106; 108). The test and measurement device (10) further comprises one or more computing units (14) configured to generate stimulus data for the one or more test devices (102; 104; 106; 108).The one or more computing units (14) are configured to provide the stimulus data to the one or more test devices (102; 104; 106; 108) via the one or more interfaces (12), to train one or more local machine learning models by the one or more test devices (102; 104; 106; 108) based on the stimulus data, and to receive training model data from the one or more trained machine learning models via the one or more interfaces (12) from the one or more test devices (102; 104; 106; 108). The one or more computing units (14) are configured to evaluate the quality of the one or more trained machine learning models based on the training model data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical field The present disclosure relates to a testing and measurement system for evaluating machine learning models of one or more test devices, a testing and measurement procedure and a computer program, in particular, but not exclusively, a concept for assessing the quality of trained learning models in test devices that are integrated into a distributed machine learning concept in a mobile communication system. background Distributed machine learning plays a central role in 3GPP (3rd Generation Partnership Project) systems for optimizing and further developing modern mobile networks. With the introduction of 5G (5th generation) and future 6G networks, machine learning methods are becoming increasingly important, especially for network management, resource allocation, and predictive maintenance. One of the key challenges is analyzing large amounts of network information in real time and deriving patterns without compromising user privacy. Distributed learning makes it possible to train models directly on end devices or at the network edge (edge ​​nodes), instead of collecting all raw data centrally. This not only reduces network traffic but also increases the security and efficiency of data processing. A concrete example of the use of distributed machine learning in 3GPP networks is federated learning. Here, individual models are trained at local nodes before the aggregated parameters are transmitted to a central server for a global model update. This approach is particularly advantageous for applications such as adaptive modulation and coding (AMC) or for optimizing handovers between radio cells. Because the computations are performed decentrally, network operators can offer a personalized user experience without sharing the underlying user data with central servers. This makes a significant contribution to data privacy compliance and latency reduction. Besides network control, distributed machine learning is also used in security mechanisms. Local models can detect anomalies in network traffic and identify potential threats such as DDoS (Distributed Denial of Service) attacks at an early stage. The combination of edge computing and machine learning creates robust attack detection mechanisms that can dynamically adapt to new threat scenarios. Future 6G systems will further expand this approach and enable even deeper integration of AI-powered optimization mechanisms into network operations. Standardization work within 3GPP is therefore crucial for developing interoperable and efficient solutions for distributed machine learning in mobile networks. Weiterer technischer Hintergrund findet sich in den folgenden Druckschriften:• WO 2024 / 064 022 A1 , TRUSTWORTHY LEVEL CONTROL OF AI / ML MODELS TRAINED IN WIRELESS NETWORKS,• US 12 008 075 B2 , TRAINING FEDERATED LEARNING MODELS,• HU, Shuyan [et al.]: Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications. In: IEEE communications surveys & tutorials, Vol. 23, 2021, No. 3, S. 1458-1493. ISSN 1553-877X. https: / / doi.org / 10.1109 / COMST.2021.3086014 ,• VAN TRUONG, Vo [et al.]: Performance Evaluation of Decentralized Federated Learning: Impact of Fully and K-Connected Topologies, Heterogeneous Computing Resources, and Communication Bandwidth. In: IEEE Access, Vol. 13, 2025, S. 32741-32755. ISSN 2169-3536. https: / / doi.org / 10.1109 / ACCESS.2025.3542772• LI, Mu [et al.]: Parameter Server for Distributed Machine Learning. https: / / www.cs.cmu.edu / ~muli / file / ps.pdf ,• LI, Mu [et al.]: Scaling Distributed Machine Learning with the Parameter Server.In: FLINN, Jason and LEVY, Hank: 11th USENIX Symposium on Operating Systems Design and Implementation, Proceedings of the, October 6-8, 2014, Broomfield, CO. Berkeley, CA: USENIX Association, 2014 (ACM other conferences). S. 583-598. ISBN 978-1-931971-16-4, https: / / www.usenix.org / conference / osdi14 / technical-sessions / presentation / li mu. Zusammenfassung Exemplary embodiments of the present disclosure are based on the core idea that the quality of distributed learning models, which train individual devices involved in a distributed machine learning system, is crucial for the overall quality or benefit of a distributed machine learning algorithm. A further insight is that a test device can be used to specifically verify and evaluate the training of a learning model in a test device by providing appropriate stimulus data and subsequently assessing the learning model. Multiple test devices can also be integrated into the process, allowing the impact of distributed machine learning to be assessed. Exemplary embodiments provide a test and measurement system for evaluating machine learning models of one or more test devices. The system comprises a test and measurement unit with one or more interfaces for communicating data with the one or more test devices. The test and measurement unit further comprises one or more computing units configured to generate stimulus data for one or more test devices and to provide this stimulus data to the one or more test devices via the one or more interfaces in order to train one or more local machine learning models on the one or more test devices based on the stimulus data.Furthermore, the test and measurement system is designed to receive learning model data from one or more test devices via one or more trained machine learning models through one or more interfaces, and to evaluate the quality of the one or more trained machine learning models based on this learning model data. In this respect, the system allows for testing or evaluating the quality of the learning models trained in the test devices. The test and measurement device can also be configured to aggregate the training model data from one or more test devices to create one or more local learning models and to evaluate the quality of the local machine learning models based on the aggregated data or an aggregated model. In this respect, the quality of the aggregated learning model—that is, the learning model based on multiple trained training models from multiple test devices—can also be evaluated. Additionally or alternatively, the test and measurement device can also simulate one or more additional clients (further (test) devices) to obtain additional machine learning models and to aggregate the machine learning models from the one or more test devices and the additional machine learning models, and to evaluate the quality of the local learning models based on the aggregated data or the aggregated machine learning model.Therefore, by combining actual and simulated learning models, or their data, a number of trained learning models can be generated that allow for the evaluation of aggregated learning models (learning models based on aggregated data), especially when the number of aggregated learning models is high (for example, more than 5, 10, or 50 test devices). In further embodiments, the test and measurement system can also include a high-frequency interface or an emulator for a high-frequency interface to establish a connection with one or more test devices. This allows for more realistic test conditions, as real or emulated high-frequency effects can be taken into account. One or more connections for coupling the one or more test devices via a radio frequency cable can also be provided. Optionally, the system can also include a server simulator to simulate a data server for federated learning in conjunction with the one or more test devices, configured for data exchange with them. In this way, a learning model that is ultimately trained by a server based on aggregated data from the test devices can also be evaluated and tested.In some embodiments, the test and measurement system can also include a unidirectional or bidirectional channel simulator for simulating a time-variable or time-constant transmission channel between the test and measurement system and the one or more test devices. For example, the channel simulator can be configured to simulate one or more effects from the group of linear distortions, nonlinear distortions, bit errors, communication delays, and / or data throughput limitations. In this respect, embodiments can also cost-effectively account for realistic effects of a mobile communication channel during evaluation. Exemplary embodiments of the test and measurement system also include one or more device simulators for simulating one or more additional devices with additional training models. The device simulator can be configured to simulate one or more elements of a group consisting of one or more additional test devices, one or more reference devices, one or more unreliable devices, one or more interfering devices, or one or more adversary / attacking devices. In exemplary embodiments, the influence of other devices with different properties or intentions can also be considered in the evaluation in this way. Furthermore, the device simulator can be configured to receive a user-defined model specification, a user-defined model training specification, a specification for a model test, and / or a user-defined trained model via a software interface.Therefore, user-specific models can be considered when simulating additional devices. In some embodiments, the test and measurement system includes a software interface for communicating a user-defined model specification and / or a specification for a model test. User-specific settings, specifications, or even preferences can then be incorporated into the evaluation. A software interface can also be used to import a user-defined trained model. Furthermore, one or more software interfaces can be provided for communicating a user-defined model specification, a user-defined model aggregation, and / or a specification for a model test. Therefore, extensive customization options for your user can be offered in some implementation examples. The test and measurement system can also include a monitoring and reporting unit. This allows the user to be informed directly by the device. The monitoring and reporting unit can be configured to display report information on a built-in display. In this way, results can be displayed directly on the test and measurement device. For example, the monitoring and reporting unit is configured to monitor and report the sequence of data transmission required for the training process, the time of transmission, the frequency of data transmission, the size of the data packets used for transmission, and / or qualitative / quantitative parameters indicating the progress of the training.This allows users to be offered a wide variety of evaluation options. Furthermore, one or more connections for coupling the one or more test devices via a radio frequency cable can be provided. Alternatively, communication via radio over the air interface is also conceivable. In this respect, signal processing in the radio frequency range can be included in the evaluation. Generally, in exemplary embodiments, the one or more additional components (e.g., simulation and / or communication components) of the test and measurement system (simulators, units, etc.) described herein can be permanently installed in the test and measurement system, and the test and measurement system can be equipped with an interface for communication with the one or more test devices. In another exemplary embodiment, the test and measurement system is coupled with one or more test devices. Furthermore, exemplary embodiments provide a testing and measurement method for evaluating learning models of one or more test devices. The method comprises generating stimulus data for the one or more test devices, wherein the one or more test devices use machine learning models, and providing the stimulus data to the one or more test devices to train one or more machine learning models based on the stimulus data. The method further comprises receiving training model data about the one or more trained machine learning models from the one or more test devices and evaluating the quality of the one or more trained machine learning models based on the training model data. Another embodiment is a computer program and / or a machine-readable medium containing program code for carrying out one of the methods described herein, when the program code is executed on a computer, a processor or a programmable hardware component. Character description Some examples of devices, methods, and / or computer programs are explained in more detail below with reference to the accompanying figures. These show: Fig. 1 a block diagram of an embodiment of a test and measurement system for evaluating learning models of one or more test devices; Fig. 2 a flowchart of an embodiment of a method for evaluating learning models of one or more test devices; Fig. 3 an embodiment with distributed learning and a parameter server; Fig. 4 an embodiment with decentralized distributed learning; and Fig. 5 another embodiment of a test and measurement system. Description Some examples are now described in more detail with reference to the accompanying figures. However, other possible examples are not limited to the features of these detailed embodiments. These may include modifications of the features, as well as equivalents and alternatives to the features. Furthermore, the terminology used herein to describe certain examples should not be considered restrictive for other possible examples. Identical or similar reference symbols throughout the description of the figures refer to identical or similar elements or features, which may be implemented in an identical or modified form, while providing the same or a similar function. Furthermore, the thickness of lines, layers, and / or areas in the figures may be exaggerated for clarity. When two elements A and B are combined using "or," this is to be understood as revealing all possible combinations, i.e., only A, only B, and A and B, unless explicitly defined otherwise in a specific case. As an alternative formulation for the same combinations, "at least one of A or B" or "A and / or B" can be used. This applies equivalently to combinations of more than two elements. When a singular form, e.g., "ein, eine" and "der, die, das," is used, and the use of only a single element is neither explicitly nor implicitly defined as mandatory, further examples may also use multiple elements to implement the same function. If a function is subsequently described as being implemented using multiple elements, further examples may implement the same function using a single element or a single processing entity.It is further understood that the terms "include", "comprehensive", "exhibit" and / or "exhibit" when used describe the presence of the specified features, integers, steps, operations, processes, elements, components and / or a group thereof, but do not exclude the presence or addition of one or more other features, integers, steps, operations, processes, elements, components and / or a group thereof. Fig. 1 shows a block diagram of an embodiment of a test and measurement system 100 for evaluating machine learning models of one or more test devices 102, 104, 106, 108. The test and measurement system 100 comprises a test and measurement device 10 with one or more interfaces 12 for communicating data with the one or more test devices 102-108. The one or more interfaces 12 are coupled to one or more computing units 14. The one or more computing units 14 are configured to generate stimulus data for the one or more test devices 102-108. The one or more test devices 102-108 use machine learning models. It is intended that the stimulus data will be provided to the one or more test devices 102-108 via the one or more interfaces in order to train one or more local machine learning models by the one or more test devices 102-108 based on the stimulus data.Furthermore, the one or more computing units 14 are trained to receive training model data from the one or more test devices 102-108 via the one or more trained machine learning models through the one or more interfaces 12. Finally, the one or more computing units 14 are trained to evaluate the quality of the one or more trained machine learning models based on the training model data. For example, in telecommunications systems such as 3GPP systems, various distributed machine learning models are used to optimize network capacity, improve security mechanisms, and develop energy-efficient communication strategies. A particularly widespread approach is federated learning (FL), in which models are trained locally on individual devices or network nodes without transmitting the raw data to a central instance. This enables privacy-friendly optimization of mobile networks, for example, to improve resource allocation or predict network utilization. Through iterative aggregation of the locally trained models, a central control unit can derive a global model without directly accessing user data. Another relevant concept is split learning, in which the training of a neural network is distributed between end devices and central servers. While the first layers of the neural network are processed locally on the device, the further processing of deeper network structures takes place on powerful servers. This allows for a balance between data privacy and computing power by transmitting only abstract feature representations instead of sensitive raw data. This method is particularly suitable for applications such as predictive network maintenance or the personalized delivery of network services based on user behavior. By combining these different learning models, the efficiency of 3GPP mobile networks is significantly increased, creating a future-proof network infrastructure. A learning model in the context of machine learning, particularly in distributed machine learning within 3GPP systems, is described by various data categories. First, the model architecture is a key aspect, defining the type of model, such as neural networks, decision trees, or support vector machines (SVMs). This includes the number of layers and neurons, the activation functions used (such as ReLU or Softmax), and various hyperparameters, including learning rate, batch size, and regularization methods. Weight initialization and parameter distribution also play a crucial role, as they can influence the model's convergence and performance. Besides the architecture, the training methodology is crucial for the performance of a learning model. This includes describing the dataset, including the type, quantity, and source of the data, such as sensor data from 5G networks or user mobility patterns. Data preprocessing is performed through normalization, feature engineering, or data augmentation. Depending on the type of learning—whether supervised, unsupervised, or reinforcement learning—different optimization algorithms such as Stochastic Gradient Descent (SGD) or Adam are used. Metrics such as accuracy, precision, recall, and the F1 score are used to evaluate the model, while the loss function, for example, Mean Square Error (MSE) or Cross-Entropy, governs the model's adaptation to the training data. In some embodiments, the testing and measurement system first provides stimulus data to the test devices. This data can be training data or input data for the learning model used by the respective test device. This stimulus data is then fed into the test device's learning model, for example, by training or inputting it, in order to obtain corresponding output values. At least in some embodiments, the learning model is also modified, for example, trained, by corresponding input data that itself contains information. A learning model is represented by various parameters that determine its structure, functionality, and performance. These parameters can generally be divided into three categories: model parameters, hyperparameters, and evaluation metrics. Model parameters are values ​​learned during the training process. These include, in particular, the weights and bias values ​​of a neural network, which determine the strength and direction of the connections between the neurons. These parameters are iteratively adjusted using optimization methods such as Stochastic Gradient Descent (SGD) or Adam to minimize prediction errors. Feature coefficients in linear models and decision boundaries in classification models also play a crucial role, as they influence the representation of the data within the model. In contrast, hyperparameters cannot be learned through training but must be predefined. These include the learning rate, which determines how much the model parameters change per iteration. An excessively high learning rate can lead to instabilities, while an excessively low rate can cause convergence problems. Other important hyperparameters are the batch size, which defines the number of training examples per optimization step, and the number of layers and neurons in neural networks, which define the model complexity. Similarly, the choice of activation functions (e.g., ReLU, sigmoid, softmax) and regularization methods (e.g., L1 / L2 regularization, dropout) influence the model's generalizability. In addition to model and hyperparameters, evaluation metrics can also play a role in describing a learning model. These include the loss function (e.g., mean squared error for regression or cross-entropy for classification), which measures the difference between predicted and actual values. Furthermore, performance metrics such as accuracy, precision, recall, and the F1 score are used to assess the quality of predictions. In continuously learning systems, adaptation parameters such as learning rate adjustment or early termination are also relevant to avoid overfitting and ensure an efficient training strategy. These parameters together determine the efficiency, accuracy and applicability of a learning model, especially in distributed machine learning methods such as those used in 3GPP networks. Finally, runtime and operational data are also important, especially for integrating the model into a distributed system. Model size affects memory consumption, while computational effort is assessed using FLOPS (floating-point operations per second) or latency per prediction. In edge computing environments, power consumption is a critical factor, as mobile devices have limited resources. A model's adaptability determines whether it can be updated continuously or incrementally to adapt to new network requirements. Depending on the deployment environment—be it on edge servers, mobile devices, or in the cloud—mechanisms for data protection and aggregation must also be considered to ensure the security of sensitive user data. In practical examples, the test devices receive the stimulus data and feed it to their training models.The test devices then communicate learning model data back to the testing and measurement system. This learning model data could include any parameters, weights, feature vectors, or other parameters describing the learning model. The testing and measurement system can then perform an evaluation of the learning models based on the learning model data. For example, the effects that training with the stimulus data has on a learning model can be tracked and / or evaluated. The test devices 102-108 can be end devices of a mobile communication system, typically designated as UE (user equipment) or DUT (device under test). The application area for the test and measurement system therefore primarily addresses mobile communication devices of modern telecommunications networks, such as those specified by 3GPP, for example, 5G, 6G, and subsequent generations. An evaluation of the learning models is particularly relevant when implementing distributed machine learning systems, as explained in more detail below. The quality of a learning model can be evaluated using various metrics and procedures that measure both its accuracy and generalizability. These can be broadly categorized into performance metrics, validation methods, and robustness analyses, all of which can be performed on the learning model data. Performance metrics are a crucial aspect of the evaluation and vary depending on the use case. For classification models, accuracy is frequently used, determining the proportion of correctly predicted classes. However, this metric is not always meaningful with unbalanced datasets, which is why complementary metrics such as precision, recall, and the F1 score are employed. While precision indicates the proportion of truly relevant positive predictions, recall measures how many of the actual positive cases were correctly identified.The F1 score combines both measures as a harmonic mean and is particularly suitable for scenarios with unbalanced class distributions. In regression models, on the other hand, Mean Square Error (MSE) or Mean Absolute Error (MAE) are used, which determine the average deviation between predicted and actual values. In addition to performance metrics, validation can be crucial for assessing model quality. A common method is cross-validation, where the dataset is divided into several subsets, allowing the model to be iteratively evaluated on different training and test sets. The holdout method, where a fixed portion of the data is reserved for testing, presents a simpler alternative. To detect overfitting, a train-test error comparison is often performed: A model with a significantly lower error on the training data compared to the test data may exhibit poor generalizability. Additionally, the robustness of the model can be analyzed to ensure that it performs well not only on specific training data. This includes adversarial testing, where deliberately altered inputs are tested to verify the stability of the predictions, as well as fairness and bias analyses, which ensure that the model does not exhibit any unwanted distortions. The computational effort, including the latency per prediction and memory consumption, is also a crucial quality criterion, especially for models used in real-time systems such as 3GPP networks. By combining these evaluation methods, a comprehensive picture of the quality of a learning model can be obtained, ensuring that it is both efficient and generalizable for real-world applications. In exemplary embodiments, the one or more interfaces 12 of the test and measurement device 10 can correspond to any means for receiving, receiving, transmitting, or providing analog or digital signals or information, e.g., a plug, contact, pin, register, input terminal, output terminal, conductor, track, antenna, etc., that enables the provision of a signal. An interface can be wireless or wired, and it can be configured to communicate with other internal or external components, i.e., to transmit or receive signals or information. In the present case, the one or more interfaces 12 can, for example, be configured to transmit information about the stimulus data and the learning model data, at least in part. In general, the one or more interfaces enable communication between the test and measurement device 10 and other components of the test and measurement system 100.with the test devices 102, 104, 106, 108. These can also make use of mobile networks or other wireless network access and include corresponding transmitter components, receiver components, gateways, etc. In exemplary embodiments, the one or more computing units 14 can be configured for digital signal processing. They can be implemented as one or more processing units, one or more processing devices, any means of processing, any means of determination, any means of calculation, such as a processor, a computer, or a programmable hardware component that can be operated with appropriately adapted software. For example, the one or more computing units can also include memory that holds corresponding queries, query catalogs, responses, instructions, etc. The described function of the one or more computing units 14 can also be implemented in software, which is then executed on one or more programmable hardware components.Such hardware components can include a general-purpose processor, a digital signal processor (DSP), a microcontroller, etc. Fig. 2 illustrates a flowchart of a test and measurement procedure 20 for evaluating learning models of one or more test devices 102-108. The procedure 20 comprises generating 22 stimulus data for one or more test devices 102-108, wherein the one or more test devices 102-108 use machine learning models. The procedure 20 further includes providing 24 the stimulus data to the one or more test devices 102-108 to train one or more machine learning models based on the stimulus data. This is followed by receiving 26 training model data about the one or more trained machine learning models from the one or more test devices 102-108 and finally evaluating 28 the quality of the one or more trained machine learning models based on the training model data. These implementation examples create a test system for federated or distributed learning in wireless communication networks. Machine learning is an essential element for 5G and later generations of mobile networks. Potential applications of distributed and federated learning include, for example, improving power management, resource allocation, selecting modulation and encoding schemes, and choosing QoS (Quality of Service) parameters. Currently, these applications are neither standardized nor specified. The examples given make no distinction between distributed and federated learning, or between online and offline training. The term "distributed learning" is used for both distributed and federated learning. Two categories can be distinguished here: • distributed learning with parameter servers, and • decentralized distributed learning. Distributed learning with a parameter server is illustrated in Fig. 3. Fig. 3 shows an embodiment of distributed learning and a parameter server. Several test devices communicate local learning parameters, such as weights and gradients, to a server, as indicated by the solid arrows. The server evaluates the data and communicates global updates of the learning parameters, such as weights and gradients, back to the test devices, as indicated by the dashed arrows in Fig. 3. The measurement system 100 directly accesses these interfaces and can monitor the corresponding communication and draw appropriate conclusions for evaluating the learning models in the test devices. Each agent (test device) trains its own local model. The (intermediate) results of the training, such as model weights or gradients, are uploaded to the server. The server aggregates the uploaded data and sends a global update to each agent. This behavior can also be replicated by the test and measurement device 10. Furthermore, the test and measurement device 10 can be configured to aggregate (summarize) the training model data or the one or more local training models from the one or more test devices and to evaluate the quality of the local machine learning models based on the aggregated data or the aggregated training model.In a further embodiment, the test and measurement system can also be configured to simulate one or more additional clients (agents, test devices, or even the server) in order to obtain additional learning model data, to aggregate the machine learning model data (learning models) from the one or more devices and the additional learning model data (learning models), and to evaluate the quality of the learning models based on the aggregated data or the aggregated learning model. In the present embodiment, the stimulus data for the test devices can therefore also include the data communicated by the server. As will be explained in more detail below, the test and measurement system can include 100 different additional components, in particular simulators for various network components, in order to create the most realistic test environment possible for the test devices.For example, the test and measurement system 100 may also include a server simulator for simulating a data server for federated learning in conjunction with the one or more test devices 102-108, which is designed to exchange data with the one or more test devices. Although immediately apparent, in the centralized model (Fig. 3) the server is not necessarily implemented in a mobile communication system base station. It could also be implemented in an end device, for example, if an end device manufacturer does not wish to share its proprietary know-how with other manufacturers and / or base station manufacturers. A fully decentralized learning approach is illustrated in Fig. 4. Fig. 4 shows an embodiment of decentralized distributed learning. Fig. 4 initially depicts three agents or test devices k, m, and n, which exchange updates of learning parameters, e.g., weights, gradients, etc. Each agent (test device) trains its own local model. The (intermediate) results of the training, e.g., model weights or gradients, are sent to the agent's neighbors. Upon receiving the updates, each agent summarizes the uploaded data and sends an update to its neighbors. As further shown in Fig. 4, the test and measurement system 100 can take the place of an agent (test device) to evaluate the learning models of the test devices k and m communicating with it based on the updates. In this case, the stimulus data are learning model data of the test device n simulated in the test and measurement system. Training a two-sided model, e.g., for CSI compression, can also be interpreted as decentralized distributed learning with two agents. For further details on methods and applications of distributed learning as such, see, for example, https: / / ieeexplore.ieee.org / document / 9446488. Exemplary embodiments of the measuring system 100 either perform measurements that are common for the most likely applications, and / or create a precisely defined and reproducible environment that is necessary for carrying out such tests and / or measurements. Tests that are common for most applications include logging communication packets between agents or between agents and the server, measuring the frequency of updates, the total volume of communication, the overhead caused by training, response times, latency times, etc. The well-defined and reproducible environment includes, but is not limited to, a time-varying, potentially lossy channel, imperfect analog hardware, processing delays (including communication latency), limitations on available bandwidth or maximum throughput, etc. The environment can also simulate additional traffic, including from unreliable and / or adversarial agents. The test results are either displayed or evaluated in the measuring device or transferred to an external interface for further evaluation. In exemplary implementations, for example, a machine learning model (also ML model) for channel estimation can be trained by the test equipment (DUT) based on downlink channel realizations generated by the test and measurement system 100. Furthermore, in other embodiments, the test and measurement system 100 can also include a unidirectional or bidirectional channel simulator for simulating a time-variable or time-constant transmission channel between the test and measurement device 10 and the one or more test devices. The channel simulator can, for example, be configured to simulate one or more effects from the group of linear distortions, nonlinear distortions, bit errors, communication delays, and / or data throughput limitations. For example, a wireless downlink with rapid attenuation is simulated. The channel simulator can therefore be configured to simulate the RF channel at various levels of detail. Analog hardware can also be used. For example, bit errors in the training data can be simulated, as well as activation and / or response times, so that lagging issues can also be analyzed.Such effects can also be simulated or caused by bandwidth limitation. In some embodiments, the test and measurement system 100 may further include a high-frequency interface (HF interface) or an emulator for a high-frequency interface to establish a connection with the one or more test devices or to emulate / simulate an HF connection. For example, the transmission of baseband IQ values ​​can take place via an IP connection, and a baseband transmission can be simulated accordingly. In some embodiments, at least one HF connection can thus be established between the device under test (one or more test devices) and the test and measurement system 10, or at least to an emulator of an HF connection between the device under test (DUT)(s) and the test and measurement system 10. As described above, the test and measurement system 100 also includes one or more device simulators for simulating one or more additional devices with additional learning models. A device simulator can be implemented, for example, as specialized hardware, a specialized software environment, or a specialized hardware / software combination. The device simulator enables the virtual replication of the functions and behaviors of a real device (e.g., mobile device, test device, reference device, etc.) by emulating essential hardware components involved in machine learning, as well as communication interfaces to cellular networks, WLAN, and Bluetooth. This realistic simulation allows various application scenarios, such as different network latencies, signal strengths, or user interactions, to be tested risk-free and efficiently via the test and measurement system, without relying on physical devices. This not only promotes the early identification and correction of errors, but also enables targeted optimization of applications, ultimately leading to shorter development times and higher quality of the final product. Specifically, additional devices involved in a supported learning process can be simulated here. The device simulator can be configured to simulate one or more elements of a group consisting of one or more additional test devices, one or more reference devices, one or more unreliable devices, one or more interfering devices, and / or one or more adversary devices (attackers). This allows for the generation of additional training model data for central aggregation. For example, comparison or reference data can be provided by a reference device to establish a basis for evaluating the (aggregated) training model data. Generally, this also makes it possible to create a more realistic test environment, as unreliable data can be included and any limitations of the available communication bandwidth can be taken into account.For example, an opposing or attacking device can introduce maliciously manipulated or generated data into the process to enable an evaluation in these cases as well. In further embodiments, a user-defined model specification, user-defined model training, a specification for a model test, and / or a user-defined trained model can also be transmitted to the device simulator via a software interface. This provides the user with a wide range of configuration and testing options. For example, a model structure, a training algorithm, etc., can be specified to the simulator. In particular, the test and measurement system 100 can include a software interface for communicating a user-defined model specification, a user-defined model aggregation, and / or a specification for a model test, through which, for example, model structure and training algorithms can also be specified. Furthermore, a software interface for importing a user-defined trained model can also be provided.The software interfaces described here can be implemented jointly, in groups or individually, for example as so-called APIs (Application Programming Interfaces), and can also be part of the one or more interfaces described above. The test and measurement system 100 may, in some embodiments, also include a monitoring and reporting unit. This unit may, for example, be configured to display, store, and / or output report positions on a built-in display (monitor, indicator) via an interface. For example, the monitoring and reporting unit may be configured to monitor and report the sequence of data transmission required for the training process, optionally the time of transmission, the frequency of data transmission, the size of the data packets used for transmission, and / or qualitative / quantitative parameters indicating the progress of the training.This allows, for example, monitoring of model parameters and / or model updates. The components described herein, such as interfaces, simulators, displays, etc., can be modular within the test and measurement system; they can be interchangeable or permanently installed in the test and measurement device 10. In some embodiments, the system 100 can be implemented as a one-piece (one-box) measuring device with permanently integrated components. The individual components can be equipped with an interface for communication with one or more test devices; this interface can be either multiple individual interfaces or shared interfaces. Furthermore, in some embodiments, the test and measurement system 100 may also include additional interfaces for communication between the components. During test and measurement operation, the test and measurement system is then coupled to one or more test devices via the corresponding one or more interfaces. The measurement setup for an embodiment of a test and measurement system 100 is shown in Fig. 5. The test and measurement system 100 comprises several computing units. The processing unit 10 (test and measurement device) contains a simulator (test device simulator 52) for at least one agent (simulated test device), which participates in the distributed learning together with the device under test (test device 54). The server simulator 56 simulates, for example, a server at the base station in the case of distributed learning with a parameter server. The channel simulator simulates a well-defined and reproducible environment. Simulator 60, for additional mobile devices, optionally simulates additional traffic, adversarial, and / or unreliable agents. The results are evaluated by the monitoring and processing unit 62. The preceding description primarily focused on the functionality of the measurement system 100. The following section examines the data transmission between the test device and the measurement system in more detail. Examples of implementations can provide a conformance or production test of the (online) federated or distributed learning capabilities of a mobile device. This involves creating a system for testing and measuring one or more devices under test (DUTs), which includes at least one machine learning algorithm. This algorithm, which can be implemented as a neural network, for example, is executed. The system includes at least one processing unit (computing unit) that repeatedly (at least once) receives (possibly intermediate) data generated during an iteration of the DUT's training process. It then computes the data required for the next iteration of the training process and transmits the computed data (stimulus data) to the DUT(s). This data includes, for example, model weights, gradients, etc.Updates to weights and gradients can then be transmitted. Optionally, a (simulated or predefined) reference device can be used for comparison or as a benchmark. Furthermore, at least one RF connection can be established between the device under test (DUT) and the test and measurement system, or at least one emulator can be present that emulates an RF connection between the DUT(s) and the test and measurement equipment. For example, baseband IQ values ​​can be transmitted via an IP connection. The system can optionally include a simulator for a server participating in the test to enable model aggregation for centralized model training. In further embodiments, the system can include a channel simulator that simulates unidirectional or bidirectional, potentially time-varying transmission channels. The channel simulator simulates, for example, linear and / or nonlinear distortions. Analog hardware can be used to simulate a radio channel. The channel simulator can additionally or alternatively simulate bit errors that may occur in the training data. Furthermore, communication delays can optionally be simulated, such as activation / response times, lagging issues, etc. The channel simulator can also simulate throughput limitations, such as those caused by bandwidth restrictions. Additionally or alternatively, a simulator for additional mobile devices can be provided, for example, to limit the available bandwidth or to simulate an unreliable client (additional test device, endpoint). An additional mobile device is, for example, an unreliable or adversary device that attempts to disrupt the learning algorithm or training process using manipulated data. This is also referred to as data poisoning. An additional mobile device can also serve as a reference device to enable benchmarking. The simulator for a user terminal can also offer a software interface for user-defined model specifications and training; examples include model structure and training algorithm. The simulator for a user terminal can also offer a software interface for importing a user-defined trained model. The processing unit can be permanently integrated into the testing and measurement system, allowing the system to be offered as a complete solution. Furthermore, the processing unit can include one or more software interfaces for user-defined model specification and training, such as model structure, training algorithm, etc. The simulator can also provide one or more software interfaces for importing a user-defined trained model. The test equipment can be connected to the test and measurement system via an RF cable or wirelessly via an air interface. The server-side simulator can also offer a software interface for user-defined model aggregation. In other embodiments, a monitoring and reporting unit can be included. This unit displays report positions on a built-in display and / or saves the data to files or outputs it via an external interface. In some embodiments, the monitoring and reporting unit may also monitor and report the sequence of data transmission required for the training process (e.g., model parameters, model updates). The timing of data transmissions required for the training process can thus be monitored and reported.The monitoring and reporting unit optionally monitors and reports the frequency of data transmission required for the training process. It can also monitor and report the size of the data packets used for transmission. The monitoring and reporting unit may also monitor and report qualitative and / or quantitative parameters that indicate the progress of the training. The following describes some specific test scenarios. The following test setup is used: • The test equipment (TE, test and measurement system) emulates a base station (with the device simulator) of a mobile communication system (e.g., a gNodeB); • one or more UEs are connected to the TE; and • one or more additional UEs can be simulated in the TE. 1. Test case: AI / ML (Artificial Intelligence / Machine Learning) CSI (Channel State Information) Feedback Compression (Compression of the channel estimation response) Training: • The TE generates downlink channel data (stimulus data from a channel simulator) for one or more UEs (e.g., based on stochastic channel models, ray tracing). • UEs locally train a complete autoencoder (e.g., based on a neural network (e.g., fully connected neural networks or convolutional neural networks)). • UEs transmit data (training model data) to the TE that is required for training a suitable decoder (e.g., uncompressed input data and compressed data in latent space). • The TE trains a suitable decoder that is interoperable with the encoders trained in the UEs. Monitoring: • The TE monitors the learning process of the UEs. • Monitoring of data exchange between the TE and UEs, and potentially between the UEs themselves (e.g., the time and order of messages, the amount of data, the duration of the exchange, etc.). • The TE monitors the performance of the trained models and the development of that performance over time: • Measurement of downlink throughput. • Measurement of the reconstruction accuracy of the decompressed CSI (e.g., cosine similarity). Ground-truth CSI (reference value) can be determined using the generated downlink channel (known data from the channel simulator). A quality measure for the learning models can be developed based on the reconstruction accuracy. 2. Test case: Another test case is AI / ML positioning (localization) using fingerprinting of CIR (Channel Impulse Response). The channel impulse response characteristic of a specific position is used to determine that position. Training: • The TE emulates a position in space by generating a corresponding downlink channel. ◯ The position can be specified by a user and is thus known as a label in the UEs. ◯ The position can be transmitted to the UEs as a label via a side channel (e.g., user data in the downlink, generated GPS (Global Positioning System) signal, image). • The UEs train local models based on the estimated downlink channel and the position label. • The UEs send the trained local models back to the TE. • The TE aggregates the models learned by the UEs and sends the global model back to the UEs. Monitoring • The TE monitors the learning process of the UEs; • Monitoring of the data exchange between the TE and UEs and, if applicable, between the UEs (e.g., the time and order of messages, the amount of data, the duration of the exchange, etc.) • If the UEs transmit the estimated position back to the TE, the TE can check the accuracy of the positioning using the known ground-truth position label and determine a measure of the quality of the estimated position and thus of the learning models. 3. Test case: Another test case is channel estimation / MIMO (Multiple-Input-Multiple-Output) detection. Training: • The TE emulates a downlink channel including pilot symbols (e.g., 5G NR DMRS (New Radio, Demodulation Reference Symbols)). • The UEs use the pilot symbols for non-AI / ML-based channel estimation / MIMO detection (e.g., Least Squares, Maximum Likelihood). The pilot symbols and the result of the channel estimation / MIMO detection can be used by the UEs as a training dataset for AI / ML-based channel estimation / MIMO detection. • The UEs train local models for AI / ML-based channel estimation / MIMO detection. • The UEs send the trained local models back to the TE. • The TE aggregates the models learned by the UEs and sends the global model back to the UEs. Monitoring • The TE monitors the learning process of the UEs; • Monitoring of data exchange between the TE and UEs, and potentially between the UEs themselves (e.g., the time and order of messages, the amount of data, the duration of the exchange, etc.). • The TE monitors the performance of the trained models and the development of performance over time. • Measurement of downlink throughput. In general, the following aspects can be covered by the TE in various test scenarios: • The TE emulates additional UEs, possibly including unsafe and disruptive UEs. • The TE emulates transmission delay. • The TE emulates the data throughput over the channel. The aspects and features described in connection with one of the previous examples can also be combined with one or more of the further examples to replace an identical or similar feature of that further example or to additionally introduce the feature into the further example. Examples can also include a (computer) program with program code for executing one or more of the above procedures, or refer to such a program when executed on a computer, processor, or other programmable hardware component. Steps, operations, or processes of various procedures described above can therefore also be executed by programmed computers, processors, or other programmable hardware components. Examples can also include program storage devices, such as digital data storage media, that are machine-, processor-, or computer-readable and encode or contain machine-executable, processor-executable, or computer-executable programs and instructions. The program storage devices can, for example,Digital storage devices include or may include magnetic storage media such as magnetic disks and magnetic tapes, hard disk drives, or optically readable digital data storage media. Further examples may also include computers, processors, control units, field-programmable logic arrays (PLAs), field-programmable gate arrays (PGAs), graphics processing units (GPUs), application-specific integrated circuits (ASICs), integrated circuits (ICs), or system-on-a-chip (SoCs) programmed to perform the steps of the procedures described above. It is further understood that the disclosure of several steps, processes, operations, or functions disclosed in the description or claims should not be interpreted as necessarily occurring in the described sequence, unless explicitly stated in a specific case or required for technical reasons. Therefore, the preceding description does not restrict the execution of multiple steps or functions to a specific sequence. Furthermore, in other examples, a single step, function, process, or operation may include and / or be broken down into multiple sub-steps, functions, processes, or operations. If certain aspects described in the preceding sections relate to a device or system, these aspects should also be understood as a description of the corresponding procedure. For example, a block, device, or functional aspect of the device or system may correspond to a feature, such as a process step, of the corresponding procedure. Similarly, aspects described in relation to a procedure should also be understood as a description of a corresponding block, element, property, or functional feature of that device or system. The following claims are hereby included in the detailed description, each claim being a separate example. It should also be noted that—although a dependent claim may refer to a specific combination with one or more other claims—other examples may include a combination of the dependent claim with the subject matter of any other dependent or independent claim. Such combinations are hereby explicitly proposed unless it is stated in a specific case that a particular combination is not intended. Furthermore, features of a claim are also to be included for each other independent claim, even if that claim is not directly defined as dependent on that other independent claim.

Claims

A test and measurement system (100) for evaluating machine learning models, comprising one or more test devices (102; 104; 106; 108), with a test and measurement device (10), with one or more interfaces (12) designed for communicating data with the one or more test devices (102; 104; 106; 108); and one or more computing units (14) trained to generate stimulus data for the one or more test devices (102; 104; 106; 108), wherein the stimulus data are training data or input data for the learning model used by the respective test device, to provide the stimulus data to the one or more test devices (102; 104; 106; 108) via the one or more interfaces (12) in order to train one or more local machine learning models by the one or more test devices (102; 104; 106; 108) based on the stimulus data, from the one or more test devices (102; 104; 106;108) to receive learning model data about the one or more trained machine learning models via the one or more interfaces (12), and to evaluate the quality of the one or more trained machine learning models based on the learning model data, wherein the test and measurement system (100) further comprises one or more device simulators (52; 60) for simulating one or more additional devices with additional learning models, wherein the device simulator (52; 60) is configured to simulate, as an additional device, one or more elements of the group of one or more additional test devices, one or more reference devices, one or more unreliable devices, one or more interfering devices, or one or more adversary devices.; The test and measurement system (100) according to claim 1, wherein the test and measurement system is further configured to aggregate the one or more local machine learning models using the stimulus data from the one or more test devices (102; 104; 106; 108) and to evaluate the quality of the local machine learning models on the basis of the aggregated one or more local machine learning models. The test and measurement system (100) according to one of claims 1 or 2, wherein the test and measurement device (10) is further configured to simulate one or more additional test devices, to obtain additional machine learning models, and to aggregate the machine learning models from the one or more test devices (102; 104; 106; 108) and the additional machine learning models to obtain an aggregated learning model and to evaluate the quality of the local learning models on the basis of the aggregated learning model. The test and measurement system (100) according to one of claims 1 to 3, which further comprises a high-frequency interface or an emulator for a high-frequency interface to establish a connection with the one or more test devices (102; 104; 106; 108). The test and measurement system (100) according to one of claims 1 to 4, which further comprises a server simulator (56) for simulating a data server for federated learning in conjunction with the one or more test devices (102; 104; 106; 108), which is configured for data exchange with the one or more test devices (102; 104; 106; 108). The test and measurement system (100) according to one of claims 1 to 5, which further comprises a uni- or bidirectional channel simulator (58) for simulating a time-variable or time-constant transmission channel between the test and measurement device (10) and the one or more test devices (102; 104; 106; 108), wherein the channel simulator (58) is configured to simulate one or more effects from the group consisting of linear distortions, nonlinear distortions, bit errors, communication delays, or limitations of a data throughput. The test and measurement system (100) according to one of claims 1 to 6, wherein one of the one or more device simulators (52; 60) is configured to receive a user-defined model specification, a user-defined model training, a specification for a model test, and / or a user-defined trained model via a software interface. The testing and measurement system (100) according to one of claims 1 to 7, which further comprises a software interface for communicating a user-defined model specification and / or a specification for a model test. The testing and measurement system (100) according to one of claims 1 to 8, which further comprises a software interface for importing a user-defined trained model. The testing and measurement system (100) according to one of claims 1 to 9, which further comprises one or more software interfaces for communicating a user-defined model specification, a user-defined model aggregation, and / or a specification for a model test. The testing and measuring system (100) according to one of claims 1 to 10, which further comprises a monitoring and reporting unit (62). The testing and measuring system (100) according to claim 11, wherein the monitoring and reporting unit (62) is configured to display report positions on a built-in display. The testing and measurement system (100) according to one of claims 11 or 12, wherein the monitoring and reporting unit (62) is configured to monitor and report the sequence of transmission of the data required for a training process, the time of transmission of the data required for the training process, the frequency of transmission of the data required for the training process, the size of packets used for the transmission of the data required for the training process, and / or qualitative / quantitative parameters that indicate progress of the training process. The test and measurement system (100) according to one of claims 1 to 13, which further comprises one or more connections for coupling the one or more test devices (102; 104; 106; 108) via a radio frequency cable. The test and measurement system (100) according to one of claims 1 to 14, wherein one or more additional simulation and / or communication components are permanently installed in the test and measurement device (10) and are equipped with an interface for communication with the one or more test devices (102; 104; 106; 108). The testing and measuring system (100) according to one of claims 1 to 15, coupled with the one or more test devices (102; 104; 106; 108). A test and measurement procedure (20) for evaluating learning models of one or more test devices (102; 104; 106; 108) comprising the following steps: Generating (22) stimulus data for the one or more test devices (102; 104; 106; 108), wherein the one or more test devices (102; 104; 106; 108) use machine learning models and wherein the stimulus data are training data or input data for the learning model used by the respective test device; Providing (24) the stimulus data to the one or more test devices (102; 104; 106; 108) in order to train one or more local machine learning models by the one or more test devices (102; 104; 106; 108) based on the stimulus data; Receiving (26) learning model data via the one or more trained machine learning models from the one or more test devices (102; 104; 106;108), Evaluating (28) the quality of one or more trained machine learning models based on the training model data, and Simulating one or more additional devices with additional training models, wherein the additional device is one or more elements of the group of one or more further test devices, one or more reference devices, one or more unreliable devices, one or more interfering devices, or one or more adversary devices.; A computer program with program code for carrying out the testing and measurement method (20) according to claim 17, when the program code is executed on a computer, a processor or a programmable hardware component. A machine-readable medium containing program code for performing the test and measurement method (20) according to claim 17, when the program code is executed on a computer, a processor or a programmable hardware component.