Method for determining an output signal by means of a machine learning system
By combining Gaussian processes and neural networks, and utilizing encoders and decoders to process the contextual information of input and output signals, the problem of neural networks being unable to determine prediction uncertainty is solved, thereby improving the prediction accuracy and reliability of machine learning systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ROBERT BOSCH GMBH
- Filing Date
- 2021-09-30
- Publication Date
- 2026-06-19
AI Technical Summary
The neural networks in modern machine learning systems are unable to effectively determine the well-calibrated uncertainty of predictions, resulting in high determinism even when predictions are incorrect or inaccurate.
By combining Gaussian processes and neural networks, employing an encoder-decoder approach, and utilizing the accumulation and weighting of multiple latent representations, contextual information is introduced to determine the expected value and variance of the output signal. Two separate neural networks are used to process the input and output signals.
It improves the performance of machine learning systems, enabling them to more accurately determine the uncertainty of predictions and enhance the reliability and accuracy of predictions.
Smart Images

Figure CN114386449B_ABST
Abstract
Description
Technical Field
[0001] The present invention relates to a method for determining an output signal by means of a machine learning system, a method for training a machine learning system, a training system, a computer program, and a machine-readable storage medium. Background Technology
[0002] Marta Garnelo et al.'s "Conditional Neural Processes" (https: / / arxiv.org / abs / 1807.01613v1), published on July 4, 2018, discloses a method for determining output signals using conditional neural processes.
[0003] Advantages of the present invention
[0004] Many at least partially automated devices or systems utilize machine learning systems for automated operation. For example, a robot can use a machine learning system to classify its surroundings, such as by using camera images of the surrounding environment to classify objects within that environment. Alternatively, it is conceivable that the robot might use the machine learning system for regression instead of classification, for example, to determine distances to objects to be classified using camera images of the surrounding environment.
[0005] Machine learning systems are typically constructed to determine predictions about an input image (such as the type and location of an object or its distance) based on that input signal (e.g., an input image). In the following text, the performance of a machine learning system can be understood as a value characterizing the average ability of the system to correctly predict the desired outcome.
[0006] Modern machine learning systems are typically based on neural networks because they often achieve very high performance across a wide range of technical problems. A drawback of neural networks is that they often cannot determine the well-calibrated uncertainty of their predictions. This means that even when predictions are incorrect or inaccurate, neural networks often output a high degree of certainty that the prediction is correct.
[0007] The advantage of a machine learning system configured to perform the method according to the invention is that it can determine well-calibrated uncertainties about predictions. Advantageously, this property is obtained through a combination of features from Gaussian processes and neural networks. Compared to other combinations of Gaussian processes and neural networks, such as conditional neural processes, the machine learning system configured to perform the method according to the invention achieves better performance. Summary of the Invention
[0008] In a first aspect, the present invention relates to a computer-implemented method for determining a first output signal by means of a machine learning system, wherein the first output signal characterizes a classification and / or regression of a first input signal and the output signal comprises: a first representation characterizing the expected value of the classification and / or the regression; and a second representation characterizing the variance of the classification and / or the regression, wherein the method comprises the following steps for the determination:
[0009] • The encoder of the machine learning system is used to determine multiple latent representations, wherein the latent representation of the multiple latent representations is determined based on at least one second input signal and a second output signal corresponding to the second input signal, wherein the second input signal and the second output signal characterize the context of the first input signal and the latent representation includes a first representation and a second representation, wherein the first representation characterizes the expected value and the second representation characterizes the variance;
[0010] • A third representation is determined based on a first representation of the potential representations among the plurality of potential representations, wherein the third representation characterizes the accumulation of these first representations;
[0011] • A fourth representation is determined based on a second representation of one of the plurality of potential representations, wherein the fourth representation characterizes the accumulation of these second representations;
[0012] • The first output signal is determined by means of the decoder of the machine learning system, wherein the decoder determines the first output signal based on the third representation, the fourth representation and the first input signal.
[0013] This method can be understood as determining the classification and / or regression of at least one first input signal using a first output signal, wherein the first output signal is determined not only based on the first input signal but also within the context of at least one other second input signal and a second output signal corresponding to that second input signal. However, preferably, in particular, multiple second input signals and corresponding multiple second output signals can be used as context. In this case, the second input signal or these second input signals can be considered to maintain a contextual relationship with the first input signal.
[0014] The second output signal can be understood as representing the classification and / or regression of the corresponding second input signal. Therefore, the context provided to the machine learning system can be understood as indicating to the machine learning system, for a specific input signal, what kind of output signal is correct, expected, or acceptable.
[0015] Compared to other machine learning systems that determine the first output signal based solely on the first input signal, this machine learning system receives significantly more information by including the second input signal and the second output signal. Therefore, the second input signal and the corresponding second output signal can be understood as orientation points, based on which the machine learning system determines the first output signal for the first input signal.
[0016] In this context, the term "context" can be understood as a relationship in which the second input signal and the second output signal, or these second input signals and the second output signals, are related to the first input signal.
[0017] For example, it can be envisioned that multiple second input signals represent corresponding time points, while the second output signals represent sampling points of the audio signal at the corresponding time points. The first input signal could, for example, represent a time point where no sampling point exists and the machine learning system should predict the sampling point. Then, the first input signal can be determined by the machine learning system within the context of the second input signals and the second output signals.
[0018] In another example, it is conceivable that the future movement of objects detected in an image should be determined based on a video sequence of images and the objects detected in those images. In this case, the images in the video sequence or their capture timestamps can be understood as a second input signal, with the object's location being the corresponding second output signal. For example, a future time point might be chosen as the first input signal, and the first output signal can then be determined at that future time point, and with the aid of that context (i.e., the second input signal and the second output signal).
[0019] The context of the first and second input signals can be designed in various ways. For example, it is conceivable that the second input signal includes pixels of the image whose pixel values are known, i.e., the portion of the image whose pixel values are known, while the first input signal contains pixels of the image whose pixel values are unknown. It is conceivable that the machine learning system in this case is configured to determine the first input signal, i.e., the unknown pixel values, based on the second input signal, i.e., the known pixels and pixel values.
[0020] In particular, it is conceivable that multiple first output signals for multiple first input signals are determined based on the same context. In the example of a video sequence, for instance, it is conceivable that the positions of objects at different future points in time can be determined based on the same number of points in time up to now.
[0021] The first output signal may characterize the expected value and variance of at least one real-valued regression. Alternatively, it is conceivable that the output signal may also characterize the classification and the uncertainty associated with that classification. For example, a first representation of the first output signal may include a vector comprising multiple Logit values for corresponding classes. A second representation may include a vector of real values, where each of these real values characterizes the variance, or uncertainty, of one of these Logit values.
[0022] The first and second input signals can, in particular, be parts of a sequence, as described above in these examples. Typically, the first and second input signals can be understood as being generated by a random process.
[0023] Therefore, this method can determine a first output signal with respect to a first input signal based on a second input signal and a second output signal that maintain a contextual relationship with the first input signal. Thus, this method advantageously extracts information not only from the first input signal but also from the second input signal and the second output signal that maintain a contextual relationship with the first input signal.
[0024] It is conceivable that the first input signal and / or the second input signal include at least a portion of an image, particularly a portion of an image determined by means of a sensor, such as a camera sensor, a LiDAR sensor, a radar sensor, an ultrasonic sensor, or a thermal imager. Alternatively, the image may be artificially generated by means of a computer-based method, such as by means of a virtual world already modeled in a computer or by means of machine learning methods. Alternatively or additionally, it is conceivable that the first input signal and / or the second input signal include at least a portion of an audio signal, particularly a portion of an audio signal determined by means of a microphone. Alternatively, it is conceivable that the audio signal is artificially generated, for example by means of digital synthesis by a computer or by means of machine learning methods. Alternatively or additionally, it is conceivable that the first input signal and / or the second input signal include sensor records of the machine's sensors, particularly sensor records of sensors that determine power consumption and / or voltage and / or rotational speed and / or velocity and / or temperature and / or pressure and / or force.
[0025] A representation can generally be understood as a single or multiple numerical values. In particular, a representation can be a scalar, vector, matrix, or tensor. It is also conceivable that a representation consists of multiple scalars and / or multiple vectors and / or multiple matrices and / or multiple tensors. Potential representations include, for example, first and second representations.
[0026] A latent representation can be understood as one that characterizes information contained in a second input signal and a second output signal corresponding to that second input signal. This information can be understood as at least partially characterizing the context. Here, the form of the latent representation can be determined by the encoder. For example, it is conceivable that the first and second representations are vectors or tensors, respectively.
[0027] The third and fourth representations can be understood as enabling the two representations to jointly characterize the context representation, that is, the accumulated information of the context.
[0028] The encoder of a machine learning system can be understood as the encoder that determines a latent representation for a second input signal and a second output signal, respectively.
[0029] A key advantage over other context-based machine learning systems is that this method allows the latent representation to be understood as derived from random variables. This enables the expression of uncertainty regarding the precise value of the latent representation. For example, a second representation of the latent representation can characterize high variance. This can be understood as making the encoder's precise latent representation of the second input signal and the second output signal uncertain.
[0030] Compared to known methods, including uncertainty allows for a significantly better accumulation of latent representations. This is because weighting of the determined latent representations can be performed based on the determined uncertainty to determine the third and fourth representations, i.e., the context representation. Thus, latent representations characterizing high variance and thereby high uncertainty can advantageously be excluded from the accumulation or included in the accumulation with low weights. Consequently, the context representation contains significantly more information. The inventors can find that the described method leads to performance improvements in machine learning systems.
[0031] Another possibility is that the fourth expression follows the first formula.
[0032]
[0033] The origin was determined, among which It is the fourth representation. It is a priori hypothesis about the fourth representation. It is the first of multiple second input signals and second output signals. The input signal and the first The potential representation of each output signal is a second representation determined by the encoder, and This indicates the reciprocal of each element.
[0034] The element-wise reciprocal can be understood as replacing each value in the representation with its reciprocal.
[0035] The first formula can also be understood as determining the element-wise reciprocals of the second representation first, adding these element-wise reciprocals together, adding the prior representation to the sum, and redetermining the element-wise reciprocals from the result.
[0036] The determination of the fourth representation according to the first formula can be achieved by the accumulation of the second representations based on the uncertainty characterized by these second representations. Thus, the second representations characterizing high uncertainty can be advantageously considered to a lesser extent compared to the second representations characterizing high certainty.
[0037] Another advantage of determining the fourth representation according to the first formula is that, in addition to the determined second representation, prior knowledge—that is, prior representations about the fourth representation—can be included in the determination of the fourth representation. This leads to an improvement in the fourth representation because further prior knowledge can be incorporated into the predictions of the machine learning system. Consequently, this improvement leads to an improvement in the contextual representation and thereby to an improvement in the performance of the machine learning system.
[0038] Another possibility is that the third expression follows the second formula.
[0039]
[0040] The origin was determined, among which It is the third indication. It is a priori hypothesis about the third representation. It is the first of multiple second input signals and second output signals. The input signal and the first The first representation of the potential representation of each output signal, determined by the encoder. It is the third indication. It is the first of multiple second input signals and second output signals. The input signal and the first The second representation of the potential representation of the output signal, determined by the encoder. This indicates element-wise multiplication and This indicates element-wise division.
[0041] The second formula can be understood as a weighted sum of the first representations that determine the latent representation, where the second representation is used as a weighting factor. Furthermore, prior assumptions about the third value, i.e., pre-assumptions, are added to this sum to determine the third representation.
[0042] The fourth representation can be determined according to the second formula: the accumulation of the first representation is based on the uncertainty characterized by the second representation. Thus, the first representation, which characterizes high uncertainty, can be advantageously considered to a lesser extent compared to the first representation characterizing high certainty.
[0043] Another advantage of determining the third representation using the mentioned formula is that, in addition to the determined first representation, prior knowledge—that is, prior representations about the third representation—can be included in the determination of the third representation. This leads to an improvement in the third representation because further prior knowledge can be incorporated into the predictions of the machine learning system. Consequently, this improvement leads to an improvement in the contextual representation and thereby to an improvement in the performance of the machine learning system.
[0044] It is also conceivable that the encoder includes a neural network configured to determine a first and a second representation of the potential representations among the plurality of potential representations based on a plurality of second input signals and a second output signal.
[0045] The advantage is that neural networks can determine the first and second representations better than other methods from the field of machine learning. This further improves the performance of machine learning systems.
[0046] Alternatively, it is conceivable that the encoder includes a first neural network configured to determine a first representation of a potential representation among the plurality of potential representations based on a plurality of second input signals; and the encoder includes a second neural network configured to determine a second representation of a potential representation among the plurality of potential representations based on a plurality of second input signals.
[0047] The inventors can surprisingly discover that the described use of two separate neural networks to determine a first or second representation of a potential representation leads to an improvement in the performance of the machine learning system.
[0048] It is also conceivable that the decoder includes a neural network configured to determine a first output signal based on a third representation and a fourth representation and a first input signal, wherein the neural network is specifically configured to determine the first representation and the second representation.
[0049] The advantage is that neural networks can determine the first and second representations of the output signal better than other methods from the field of machine learning. This further improves the performance of machine learning systems.
[0050] Alternatively, it is conceivable that the decoder includes a first neural network configured to determine a first representation of the first output signal based on a third representation and a first input signal; and the decoder includes a second neural network configured to determine a second representation of the first output signal based on a fourth representation and the first input signal.
[0051] The inventors can surprisingly discover that the described use of two separate neural networks to determine a first or second representation of a first output signal results in an improvement in the performance of the machine learning system.
[0052] It is also conceivable that the method for determining the output signal includes training a machine learning system, wherein the training includes the following steps:
[0053] • Determine multiple training input signals, wherein the multiple training input signals maintain a contextual relationship with each other, and each training input signal is assigned a corresponding training output signal;
[0054] • Divide multiple training input signals and training output signals into multiple second training input signals and corresponding second training output signals, and further divide them into at least one first training input signal and at least one corresponding first training output signal;
[0055] • Determine the third and fourth representations based on the plurality of second training input signals and these second training output signals;
[0056] • A predicted output signal for at least one first training input signal is determined by means of a decoder of a machine learning system and based on the determined third representation and the determined fourth representation and the first training input signal;
[0057] • Determine a loss value that represents the difference between the predicted output signal and the first training output signal;
[0058] • Determine the gradients of multiple parameters of the encoder and / or decoder with respect to the loss value;
[0059] • Change the multiple parameters based on the determined gradient.
[0060] The plurality of training input signals may include, for example, input signals from a sequence, such as sample points of individual images and / or audio signals from a video. The training output signal may characterize corresponding annotations of the input signals of the sequence, such as the location of an object in an image.
[0061] Preferably, the plurality of training input signals can be randomly divided into a plurality of second training input signals and a plurality of first training input signals. However, a predefined division can also be used. For example, it is conceivable that the machine learning system should be used to predict the continuation of the sequence of second input signals and second output signals. In this case, it may be advantageous to subdivide the sequence of training input signals such that a first portion of the sequence is used as a plurality of second training input signals and the remaining training input signals are used as a plurality of first training input signals.
[0062] Since the training output signal is assigned to the training input signal, the division of the training input signal can also be understood as the division of the training output signal.
[0063] Preferably, the steps described in the training method are repeated iteratively to train the machine learning system. In each training iteration, new training input signals and new training output signals can preferably be determined for the machine learning system, thus allowing the machine learning system to be trained using a variety of input and output signals.
[0064] The training input signals and corresponding training outputs can preferably be provided by a computer-implemented database. For each training iteration, the corresponding training input signals and training output signals can be randomly retrieved from the database.
[0065] The training input signal and the corresponding training output signal can also be understood together as training data.
[0066] Since a machine learning system can be understood as a chain of differentiable functions, an error recovery algorithm (backpropagation algorithm) for determining gradients can be used to train it. Based on this error recovery algorithm, the machine learning system can preferably be trained using gradient descent.
[0067] Unlike training methods based on variational inference, the described training method can be understood as deterministic. This characteristic reduces training time because it is not necessary to determine different model parameters for the same training data, which is essential in training based on variational inference. The reduction in training time results in the machine learning system being trained with more training data within a fixed time interval. In turn, this training method can improve the performance of the machine learning system.
[0068] In particular, it is conceivable that the loss value is determined based on a loss function, which characterizes the probability density function of a normal distribution or a logarithmic probability density function, wherein a first representation of the predicted output signal determined by the decoder is used as the expectation value of the probability density function and a second representation of the predicted output signal determined by the decoder is used as the variance or covariance matrix of the probability density function.
[0069] Although this particular form of the loss function ignores the possible correlations between elements of the sequence of the first training input signal, the inventors can surprisingly find that using this loss function during training can still and advantageously improve the performance of the machine learning system.
[0070] In particular, it is conceivable that a machine learning system is trained using multiple initial training input signals and training output signals, where the loss value is calculated according to the third formula.
[0071]
[0072] The origin was determined, among which These are multiple first training input signals. These are the training output signals that are correspondingly assigned to these first training input signals. It is the probability density function of a normal distribution. It is the first representation of the predicted output signal determined by the decoder. It is a second representation of the predicted output signal determined by the decoder, wherein the determined first representation is used as the expectation value of the probability density function and the determined second representation is used as the variance or covariance matrix of the probability density function.
[0073] It can be understood as multiple first training input signals and This can be understood as multiple first training output signals corresponding to these first training input signals, where It can be understood as the number of the first training input signals or the first training output signals. This can be understood as the first of the plurality of first training input signals. elements and It can be understood as being with The corresponding first training output signal.
[0074] For using only one first training input signal and a corresponding first training output signal In the case of [the previous situation], the same loss function can be used, where the summation is omitted in the third formula.
[0075] The advantage of determining the loss value using the third formula is that the machine learning system can be trained using different numbers of first or second training input signals in each training iteration. This makes the machine learning system robust to different numbers of second input signals, for which the context representation should be determined at inference time. This improves the performance of the machine learning system. Attached Figure Description
[0076] Embodiments of the present invention will then be described in more detail with reference to the accompanying drawings. In the drawings:
[0077] Figure 1 This schematically illustrates the construction of a machine learning system;
[0078] Figure 2 The schematic diagram illustrates the construction of a control system for manipulating actuators based on the output signals of a machine learning system;
[0079] Figure 3 An embodiment for controlling at least a partially autonomous robot is illustrated schematically;
[0080] Figure 4 An embodiment for controlling a production system is illustrated schematically;
[0081] Figure 5 An embodiment for controlling access is illustrated schematically;
[0082] Figure 6 An embodiment for controlling and monitoring systems is illustrated schematically;
[0083] Figure 7 An embodiment for controlling a personal assistant is illustrated schematically;
[0084] Figure 8 An embodiment of a training system for training machine learning systems is illustrated schematically. Detailed Implementation
[0085] Figure 1 A machine learning system (60) is shown, which is configured to be based on multiple (63) second input signals ( ) and the second output signal assigned to these second input signals ( ) and the first input signal ( ) to determine the first output signal ( ).
[0086] Here, the second output signal ( ) can be understood as being assigned to the second input signal ( The annotation for the second output signal ( ). For example, a second input signal can be assigned by a human. Alternative or additional grounds are conceivable: a second output signal ( The characterization of the second input signal is achieved through a machine learning system (60). The first prediction occurs on the first output signal ( ) Before that is determined.
[0087] First input signal ( ) and / or the second input signal ( ) and / or the second output signal ( In particular, it may include or consist of numerical values, which may exist in the form of scalars, vectors, matrices or tensors.
[0088] The machine learning system receives multiple (63) second input signals in the encoder (61). ) and the second output signal assigned to these second input signals ( The encoder (61) is preferably configured such that, for each second input signal ( ) and the second output signal corresponding to the second input signal ( ) to determine the latent representation ( ), where the latent representation includes a first representation characterizing the expected value ( ), and this latent representation includes a second representation characterizing the variance ( ).
[0089] The encoder (61) preferably includes two neural networks, wherein the first neural network of the encoder (61) is configured to be based on the second input signal ( ) and second output signal ( ) to determine the first representation ( ), and the second neural network of the encoder (61) is set up based on the second input signal ( ) and second output signal ( ) to determine the second representation ( ).
[0090] In order to process the second input signal ( ) and the corresponding second output signal ( These values can, for example, be concatenated into a vector, which can then be passed to the encoder (61). Alternatively, it is conceivable that the neural network of the decoder (61) is configured such that these neural networks each have two input terminals and a second input signal ( ) and the corresponding second output signal ( It can be transferred separately through these two input terminals.
[0091] These potential representations ( ) are accumulated to form a contextual representation ( ), where the context represents ( This includes a third representation characterizing the expected value of the accumulation. And it includes a fourth representation of the accumulated variance. Preferably, the third indicates (). According to the formula
[0092]
[0093] To be determined, and the fourth indicates ( According to the formula
[0094]
[0095] The origin was determined, among which It is a priori hypothesis about the third representation. It is a priori hypothesis about the fourth representation. It is the first representation of the latent representation determined by the encoder (61). It is the third indication. It is the second representation of the latent representation determined by the encoder (61). This indicates element-wise multiplication. This indicates the element-wise inversion of the representation and This indicates the element-wise division of the second representation.
[0096] The third representation ( ) and the fourth representation ( ) can be understood as a context representation (z).
[0097] Preferably, the zero element can be selected as the third representation ( The prior assumption of ). If the third representation ( If is a scalar, then the zero element can have the value zero. For the third representation ( In the case of a vector, the zero element can be a zero vector, and for a third representation in the form of a matrix or tensor ( For example, the zero element can be a matrix or tensor that is completely filled with zeros.
[0098] Preferably, an element can be selected as the fourth representation ( The prior assumption of ). If the fourth represents ( If is a scalar, then the element can have the value one. For the fourth representation ( In the case of a vector, this element can be a unit vector, and for the fourth representation in the form of a matrix or tensor ( For example, an element can be a matrix or tensor filled entirely with ones.
[0099] Preferably, the third representation ( ) and the fourth representation ( The dimensions of the context representation are the same. This can be understood as making the context representation ( The prior assumption of is a standard normal distribution, or a multivariate standard normal distribution if necessary.
[0100] Next, the context indicates ( ) and the first input signal ( The signal is handed over to the decoder (62) of the machine learning system (60). The decoder (62) is configured to respond to the first input signal ( ) and context representation ( ) to determine the first output signal ( First output signal ( ) includes: a first representation, which characterizes the machine learning system (60) with respect to a first input signal ( The expected value of the prediction; and the second representation, which characterizes the variance of the prediction. This can be understood as enabling the machine learning system (60) to output a first signal ( This is used to provide predictions with uncertainty. Here, the prediction can characterize the first input signal ( The classification of ) is as follows. Alternatively or additionally, it is conceivable that the prediction represents a regression of real values, real vectors, real matrices, or real tensors.
[0101] Preferably, the first output signal ( The dimensions of the first and second representations of ) and the second output signal ( )same.
[0102] The decoder (62) preferably includes two neural networks, wherein the first neural network of the decoder (62) determines the first output signal ( The first representation of the decoder (62) and the second neural network of the decoder (62) determine the output signal. The second representation of ).
[0103] In order to process the first input signal ( ) and context representation ( ), first input signal ( ) and context representation ( The values included in the signal can be concatenated, and this concatenation can be handed over to the neural network of the decoder (62). Alternatively, it is conceivable that the neural network of the decoder (62) has specific values for the first input signal ( ) and context representation ( () is a separate input terminal.
[0104] In other implementations (not shown), it is conceivable that the machine learning system (60) targets a plurality of first input signals respectively. Determine the first output signal ( This can be understood as enabling the machine learning system (60) to process a batch of the first input signal. ).
[0105] Figure 2 An actuator (10) is shown interacting with a control system (40) in its surrounding environment (20), wherein the control system (40) includes a machine learning system (60) and the actuator (10) responds to at least one first output signal from the machine learning system (60). (to be controlled)
[0106] The surrounding environment (20) is detected by a sensor (30), particularly an imaging sensor such as a camera sensor, at preferably uniform time intervals. This can also be provided by multiple sensors, such as a stereo camera. The sensor signals (S) of the sensor (30) – or each sensor signal (S) in the case of multiple sensors – are transmitted to the control system (40). Thus, the control system (40) receives the sequence of sensor signals (S). Based on this, the control system (40) determines control signals (A), which are transmitted to the actuator (10).
[0107] The control system (40) receives a sequence of sensor signals (S) from the sensor (30) in an optional receiving unit (50), which converts the sequence of sensor signals (S) into a sequence of first input signals (x) (alternatively, each sensor signal (S) can be directly used as an input signal (x)). The first input signal (x) may be, for example, a fragment of the sensor signal (S) or further processing of the sensor signal. In other words, the first input signal (x) is determined based on the sensor signal (S). The sequence of the first input signal (x) is fed to the machine learning system (60). Additionally, a plurality of (63) second input signals and a second output signal are fed to the machine learning system (60). The plurality of (63) second input signals and the second output signal can be understood as the context (63) of the first input signal (x).
[0108] The machine learning system (60) is preferably parameterized by parameters (φ), which are stored in and provided by a parameter memory (P).
[0109] The machine learning system (60) determines a first output signal (y) based on the input signal (x) and the context (63). The output signal (y) is fed to an optional modification unit (80) which determines control signals (A) accordingly, which are fed to the actuator (10) to control the actuator (10) in a corresponding manner.
[0110] The actuator (10) receives the control signal (A), is correspondingly controlled, and performs a corresponding action. In this case, the actuator (10) may include (not necessarily structurally integrated) control logic that determines a second control signal to be used to control the actuator (10) based on the control signal (A).
[0111] In other embodiments, the control system (40) includes a sensor (30). In still other embodiments, alternatively or additionally, the control system (40) also includes an actuator (10).
[0112] In other preferred embodiments, the control system (40) includes at least one processor (45) and at least one machine-readable storage medium (46) on which commands are stored, which, when executed on the at least one processor (45), cause the control system (40) to perform the method according to the invention.
[0113] In alternative implementations, a display unit (10a) is provided in place of the actuator (10) or in addition to the actuator.
[0114] Figure 3 The control system (40) is shown as being able to be used to control at least a partially autonomous robot, which is here at least a partially autonomous motor vehicle (100).
[0115] The motor vehicle (100) may have multiple sensors (30), such as different types of sensors (30), such as lidar sensors, camera sensors, and / or ultrasonic sensors. Preferably, these sensors (30) are built into the vehicle. In this case, the first input signal (x) can be understood as an input image.
[0116] It is conceivable that the machine learning system is configured to identify objects that can be recognized on an input image (x). Therefore, the first output signal (y) can characterize the location of the object, and the variance about that location, which can be understood as the uncertainty about the object's precise location. Context (63) may, for example, include input images (x) that occurred in the past, for which objects were detected at previous points in time. In this case, the machine learning system (60) can be understood as such that the machine learning system should determine the location of the object on the current input image based on past input images and the objects detected on those input images.
[0117] Alternatively, it is conceivable that the context includes input images from other sensors (30) of the vehicle. In this case, the machine learning system (60) can be understood as performing the fusion of sensor signals (S), where the result of the fusion carries uncertainty.
[0118] The actuator (10) preferably arranged in the motor vehicle (100) may be, for example, the braking device, drive device or steering device of the motor vehicle (100). Then, the control signal (A) may be determined to cause the actuator or these actuators (10) to be manipulated so that, especially when certain types of objects are involved, such as pedestrians, the motor vehicle (100) avoids collision with objects identified by the machine learning system (60).
[0119] Alternatively or additionally, the display unit (10a) can be controlled using a control signal (A), and, for example, the identified object can be displayed. It is also conceivable that the display unit (10a) is controlled using the control signal (A) to output an optical or audible warning signal if it is determined that the vehicle (100) is about to collide with one of the identified objects. This can also be achieved through a tactile warning signal, for example, through vibration of the steering wheel of the vehicle (100).
[0120] Alternatively, the at least partially autonomous robot may also be other mobile robots (not shown), such as those that move by flying, floating, diving, or walking. The mobile robot may also be, for example, an at least partially autonomous lawnmower or an at least partially autonomous cleaning robot. In these cases, the control signal (A) may also be determined to cause the drive and / or steering mechanisms of the mobile robot to be manipulated to prevent the at least partially autonomous robot from colliding with objects identified by the machine learning system (60), for example.
[0121] Figure 4An embodiment is shown in which a control system (40) is used to operate the production machine (11) of a production system (200) by manipulating the actuator (10) that controls the production machine (11). The production machine (11) may be, for example, a machine for stamping, sawing, drilling and / or cutting. It is also conceivable that the production machine (11) is configured to grip finished products (12a, 12b) by means of a gripper.
[0122] The sensor (30) could be, for example, a video sensor that detects the conveyor belt (13) where finished products (12a, 12b) may be present. In this case, the input signal (x) is an input image (x). The machine learning system (60) could be configured, for example, to determine the position of the finished products (12a, 12b) on the conveyor belt based on the input signal (x). Then, the actuator (10) controlling the production machine (11) could be manipulated according to the determined position of the finished products (12a, 12b). For example, the actuator (10) could be manipulated to punch, saw, drill, and / or cut the finished products (12a, 12b) at predetermined locations.
[0123] The machine learning system can transfer the input signal (x) that occurred at a past time point and the positions of the finished products (12a, 12b) on which they are respectively determined as context (63).
[0124] It is also conceivable that the machine learning system (60) is configured to determine other characteristics of the finished products (12a, 12b) besides location. In particular, it is conceivable that the machine learning system (60) determines whether the finished products (12a, 12b) are defective and / or damaged. In this case, the actuator (10) can be manipulated to cause the production machine (11) to reject defective and / or damaged finished products (12a, 12b).
[0125] Figure 5An embodiment is shown in which a control system (40) is used to control an access system (300). The access system (300) may include physical access control devices, such as a door (401). The sensor (30) may be, in particular, a video sensor or a thermal imaging sensor, configured to detect the area in front of the door (401). The detected image can be interpreted by means of a machine learning system (60). The machine learning system (60) may, in particular, detect persons in the input image (x) transmitted to the machine learning system. If multiple persons are detected simultaneously, their identities can be determined particularly reliably, for example, by the correlation between these persons (i.e., objects), such as by the analysis of the movement of these persons.
[0126] The machine learning system (60) can transmit the input signal (x) that occurred at a past point in time and the people detected thereon, as context (63).
[0127] The actuator (10) can be a lock that activates or deactivates the access control device based on the control signal (A), for example, opening or closing the door (401). For this purpose, the control signal (A) can be selected based on the output signal (y) determined by the machine learning system (60) for the input image (x). For example, it is conceivable that the output signal (y) includes information characterizing the identity of the person detected by the machine learning system (60), and the control signal (A) is selected based on that person's identity.
[0128] Logical access control devices can also be used instead of physical access control devices.
[0129] Figure 6 The following embodiment is shown in which the control system (40) is used to control the monitoring system (400). This embodiment is similar to... Figure 5 The difference in the embodiment shown is that, instead of the actuator (10), a display unit (10a) is provided, which is controlled by the control system (40). For example, a sensor (30) can record an input image (x) on which at least one person can be identified, and the position of the at least one person can be detected by means of a machine learning system (60). Then, the input image (x) can be displayed on the display unit (10a), in which the detected person can be highlighted in color.
[0130] Figure 7An embodiment is shown in which a control system (40) is used to control a personal assistant (250). Preferably, the sensor (30) is an optical sensor, such as a video sensor or a thermal imaging sensor, which receives an image of the user's (249) gesture.
[0131] Based on the signal from the sensor (30), the control system (40) determines the control signal (A) for the personal assistant (250), for example, by the machine learning system (60) performing gesture recognition. The determined control signal (A) is then transmitted to the personal assistant (250), and the personal assistant is thus controlled accordingly. The determined control signal (A) can be specifically selected to correspond to a desired control envisioned by the user (249). This desired control can be determined based on the gesture recognized by the machine learning system (60). The control system (40) can then select the control signal (A) to be transmitted to the personal assistant (250) and / or select the control signal (A) to be transmitted to the personal assistant (250) corresponding to the desired control, based on the desired control.
[0132] The corresponding operation may include, for example, a personal assistant (250) retrieving information from a database and reproducing that information in a manner acceptable to the user (249).
[0133] Instead of a personal assistant (250), it can also set up household appliances (not shown), especially washing machines, stoves, ovens, microwave ovens or dishwashers, so that they can be controlled accordingly.
[0134] Alternatively, it is conceivable that a personal assistant (250) could also be controlled via voice commands from a user (249). The context (63) could, for example, characterize the sequence of sampling times and sample values of the audio signal from the audio sensor (30), where the machine learning system (60) is configured to predict other sample values for other sampling times. This context, along with these other sampling times and the predicted sample values, could then be handed over to a classifier, which determines the classification of the voice command based on its input.
[0135] Figure 8 An embodiment of a training system (140) for training a machine learning system (60) of a control system (40) using a training dataset (T) is shown. The training dataset (T) preferably includes the input signal (x). i Multiple sequences of data (60) were used to train a machine learning system, wherein the training dataset (T) was also used for each input signal (x). i This includes the desired output signal (y). iThe output signal and the input signal (x) i ) corresponds to and characterizes the input signal (x) i ) classification and / or regression.
[0136] For training purposes, the training data unit (150) accesses a computer-implemented database (St2), which provides a training dataset (T). The training data unit (150) preferably randomly determines at least one sequence of input signals and the corresponding desired output signal based on the training dataset (T). Then, the training data unit (150) preferably randomly divides the first input signal of the sequence into a first plurality of input signals (…). ) and multiple input signals ( Alternatively, it is conceivable that the sequence is split at predefined points in order to determine the first plurality of ( ) and the second or more ( ).
[0137] The second plurality of input signals ( The input signals and the corresponding output signals are provided as context to the machine learning system (60), while the first plurality of input signals ( ) are provided as the first input signals to the machine learning system (60). Based on these inputs, the machine learning system (60) targets the first plurality of ( Each input signal in ) determines the corresponding output signal. Therefore, after this determination, for the first plurality of ( For each input signal in ), there exists a defined output signal ( ) and the desired output signal ( ).
[0138] Desired output signal ( ) and the determined output signal ( ) is transmitted to the change unit (180).
[0139] Next, based on the desired output signal ( ) and the determined output signal ( ), changing unit (180) to determine new parameters for machine learning system (60) Therefore, the modified unit (180) uses a loss function to adjust the desired output signal ( ). ) and the determined output signal ( The comparison is performed. The loss function determines a first loss value, which characterizes the determined output signal ( ). ) and the desired output signal (y i The degree of deviation.
[0140] Preferably, the loss function is formulated according to the formula
[0141]
[0142] To determine the loss value, where It is the first of multiple input signals. The output signal characterizes the first plurality of input signals. It is the probability density function of a normal distribution. The decoder (62) is for the input signals in the first plurality of signals. and the third representation of potential representation The first output, The decoder (62) is for the input signal and the fourth representation of potential representation The second output, wherein the first output is used as the expectation of the probability density function and the second output is used as the variance or covariance matrix of the probability density function. This is achieved by the encoder (61) of the machine learning system (60) and based on the second plurality of ( ) to determine the third representation of the latent representation ( ) and the fourth representation ( ).
[0143] The modified unit (180) determines the new parameters based on the first loss value. In this embodiment, this is achieved using gradient descent, preferably stochastic gradient descent, Adam, or AdamW.
[0144] The determined new parameters ( The determined new parameters are stored in the model parameter memory (St1). Preferably, the determined new parameters (St1) are stored in the model parameter memory (St1). ) as a parameter ( ) is provided to the machine learning system (60).
[0145] In other preferred embodiments, the described training is iteratively repeated for a predetermined number of iterations, or iteratively repeated until a first loss value falls below a predetermined threshold. Alternatively or additionally, it is also conceivable that the training terminates when the average first loss value on the test or validation dataset falls below a predetermined threshold. In at least one of these iterations, new parameters (determined in previous iterations) are used. ) is used as a parameter of the machine learning system (60) ).
[0146] Furthermore, the training system (140) may include at least one processor (145) and at least one machine-readable storage medium (146) containing instructions that, when executed by the processor (145), cause the training system (140) to implement the training method according to any of these aspects of the invention.
[0147] The term "computer" includes any device used to run computational rules that can be given in advance. These computational rules can exist in the form of software, hardware, or a combination of both.
[0148] Typically, "multiple" can be understood as indexed, meaning that a unique index is assigned to each element in the multiple, preferably by assigning a unique index to each element in the multiple by assigning consecutive integers to the elements contained in the multiple. Preferably, if "multiple" includes There are elements, among which If the number of elements in the plurality is given, then these elements are assigned from 1 to... N Integers.
Claims
1. A computer-implemented method for determining a first output signal (y) by means of a machine learning system (60), wherein the first output signal (y) characterizes a classification and / or regression of a first input signal (x) and the output signal (y) comprises: The first representation characterizes the expected value of the classification and / or the regression; The second representation characterizes the variance of the classification and / or the regression, wherein the first input signal (x) includes at least a portion of an image, a portion of an audio signal, and / or sensor records of the machine's sensors, wherein the sensor records include power consumption and / or voltage and / or rotational speed and / or velocity and / or temperature and / or pressure, wherein the method for this determination includes the following steps: Multiple latent representations are determined by the encoder (61) of the machine learning system (60). ), wherein the latent representation among the plurality of latent representations ( Based on at least one second input signal ( ) and the second input signal ( The corresponding second output signal () ) is determined, wherein the second input signal ( ) and the second output signal ( ) characterizes the context of the first input signal (x) and the latent representation ( ) includes the first representation ( ) and the second representation, wherein the first representation ( ) represents the expected value and the second representation ( ) characterizes the variance, where the second input signal ( The sensor record includes at least a portion of an image, a portion of an audio signal, and / or a sensor record of the machine's sensors, wherein the sensor record includes power consumption and / or voltage and / or rotational speed and / or velocity and / or temperature and / or pressure; Based on the potential representations among the multiple potential representations ( The first representation of () ) to determine the third representation ( ), where the third represents ( ) characterize these first representations ( The accumulation of ) Based on the potential representations among the multiple potential representations ( The second meaning of ) ) to determine the fourth representation ( ), where the fourth represents ( ) characterize these second representations ( The accumulation of ) The first output signal (y) is determined by the decoder (62) of the machine learning system (60), wherein the decoder (62) is based on the third representation ( ) and the fourth representation ( The first output signal (y) is determined by the first input signal (x) and the first input signal (x).
2. The method according to claim 1, wherein the fourth represents ( According to the formula The origin was determined, among which It is about the fourth representation ( The prior assumption of ) It is multiple second input signals ( ) and second output signal ( The first in ) The input signal and the first The potential representation of each output signal ( The second representation determined by the encoder (61) ),and This indicates the reciprocal of each element.
3. The method according to any one of claims 1 or 2, wherein the third representation ( According to the formula The origin was determined, among which It is about the third representation ( The prior assumption of ) It is multiple second input signals ( ) and second output signal ( The first in ) The input signal and the first The potential representation of each output signal ( The first representation determined by the encoder (61) ), It is the third indication. The plurality of second input signals ( ) and second output signal ( The first in ) The input signal and the first The potential representation of each output signal ( The second representation determined by the encoder (61) ), This indicates element-wise multiplication and This indicates element-wise division.
4. The method according to any one of claims 1 to 3, wherein the encoder (61) comprises a neural network configured to: be based on a plurality of second input signals ( ) and second output signal ( ) to determine these latent representations ( The first representation of () ) and second representation ( ).
5. The method according to any one of claims 1 to 3, wherein the encoder (61) comprises a first neural network configured to be based on a plurality of second input signals ( ) and second output signal ( ) to determine these latent representations ( The first representation of () Furthermore, the encoder (61) includes a second neural network configured to be based on a plurality of second input signals ( ) and second output signal ( ) to determine these latent representations ( The second meaning of ) ).
6. The method according to any one of claims 1 to 5, wherein the decoder (62) comprises a neural network configured to be based on the third representation ( ) and the fourth representation ( The first output signal (y) is determined by the first input signal (x) and the first input signal (x).
7. The method according to any one of claims 1 to 5, wherein the decoder (62) comprises a first neural network configured to be based on the third representation ( The decoder (62) uses the first input signal (x) and the first input signal (x) to determine a first representation of the first output signal (y); and the decoder (62) includes a second neural network configured to be based on the fourth representation (x). The first input signal (x) and the first input signal (x) are used to determine the second representation of the first output signal.
8. The method according to any one of claims 1 to 7, wherein the method for determining the output signal (y) includes training the machine learning system (60), wherein the training includes the following steps: Multiple training input signals are determined, wherein the multiple training input signals maintain a contextual relationship with each other, and each training input signal is assigned a corresponding training output signal; Divide multiple training input signals and training output signals into multiple ( The second training input signal and the corresponding second training output signal are divided into at least one first training input signal. ) and at least one corresponding first training output signal ( ); Based on the above multiple ( The second training input signal and these second training output signals determine the third representation. ) and the fourth representation ( ); with the help of the decoder (62) of the machine learning system (60) and based on the determined third representation ( ) and the determined fourth representation ( ) and the first training input signal ( ) to determine for the at least one first training input signal ( The predicted output signal of ) ); Determine the loss value, which characterizes the loss in the predicted output signal ( ) and the first training output signal ( The difference between ) Determine the gradients of multiple parameters of the encoder (61) and / or the decoder (62) with respect to the loss value; The parameters are changed based on the determined gradient.
9. The method of claim 8, wherein the loss value is determined based on a loss function, wherein the loss function characterizes a normal probability density function or a logarithmic probability density function, wherein the predicted output signal determined by the decoder (62) is... The first representation of ) is used as the expected value of the probability density function and the predicted output signal determined by the decoder (62) The second representation of ) is used as the variance or covariance matrix of the probability density function.
10. The method of claim 8 or 9, wherein the machine learning system (60) is trained using a plurality of first training input signals and training output signals, wherein the loss value is calculated according to the formula The origin was determined, among which These are multiple first training input signals. These are the training output signals that are correspondingly assigned to these first training input signals. It is the probability density function of a normal distribution. It is the predicted output signal ( The first representation determined by the decoder (62) It is the predicted output signal ( The second representation determined by the decoder (62) is used as the expected value of the probability density function and the second representation is used as the variance or covariance matrix of the probability density function.
11. The method according to any one of claims 1 to 10, wherein the device (100, 200, 250, 300, 400) is operated according to the first output signal (y).
12. A machine learning system (60) configured to implement the method according to any one of claims 1 to 11.
13. A training device (140) configured to perform the method according to any one of claims 8 to 10.
14. A computer program product comprising a computer program configured to perform the method according to any one of claims 1 to 11 when the computer program is executed by a processor (45, 145).
15. A machine-readable storage medium (46, 146) having a computer program stored thereon, the computer program being configured to perform the method according to any one of claims 1 to 11 when the computer program is executed by a processor (45, 145).