Method and system for generating machine learning models for a vehicle's voice assistant

The method and system generate machine learning models for vehicle voice assistants by using a language model for semantic processing and a classification model for function assignment, addressing the challenge of varied user commands and interface differences, ensuring efficient recognition and control of vehicle functions across multiple versions.

DE102024136402A1Pending Publication Date: 2026-06-11BAYERISCHE MOTOREN WERKE AG

Patent Information

Authority / Receiving Office
DE · DE
Patent Type
Applications
Current Assignee / Owner
BAYERISCHE MOTOREN WERKE AG
Filing Date
2024-12-06
Publication Date
2026-06-11

AI Technical Summary

Technical Problem

Existing voice assistants in vehicles struggle to accurately interpret varied user commands due to differences in application programming interfaces (APIs) across vehicle models and software updates, leading to complexity in classifying and assigning voice inputs to vehicle functions.

Method used

A method and system for generating machine learning models that utilize a language model to process user input semantically and a classification model to assign vehicle functions, allowing for easy adaptation to multiple functional interface versions by training only the classification model without altering the language model's parameters.

🎯Benefits of technology

Enables voice assistants to efficiently recognize and control vehicle functions despite variations in user input and interface versions, reducing computational effort and storage requirements, and facilitating updates through remote processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 00000000_0000_ABST
    Figure 00000000_0000_ABST
Patent Text Reader

Abstract

In a method for generating machine learning models for a vehicle voice assistant (302), a first trained machine learning model is generated from a machine learning model comprising a language model trained to generate an output corresponding to the semantic meaning of the user input based on a user input corresponding to a vehicle function desired by a vehicle occupant (310), and a classification model trained to accept the output of the language model as input, using first training data and without changing the parameters of the language model. This first trained machine learning model is trained to identify, based on the user input, a vehicle function from a first group of vehicle functions that most likely corresponds to the desired vehicle function.Furthermore, using a second set of training data and without changing the parameters of the language model, a second trained machine learning model is created. This model is trained to identify, based on user input, a vehicle function from a second group of vehicle functions that most likely corresponds to the desired vehicle function. The first set of training data comprises sample user inputs, each labeled with a corresponding vehicle function from the first set of vehicle functions. The second set of training data comprises sample user inputs, each labeled with a corresponding vehicle function from the second set of vehicle functions. The first and second sets of vehicle functions have different functions and / or the vehicle functions of the first set and the vehicle functions of the second set have different function parameters.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] The invention relates to a method for generating machine learning models for a vehicle's voice assistant. A further aspect of the invention relates to a system for generating machine learning models for a vehicle's voice assistant. The invention further relates to a vehicle device for controlling vehicle functions.

[0002] Many vehicle functions in modern vehicles can be activated with spoken user input. This allows, for example, a driver to operate vehicle functions without taking their hands off the steering wheel or their eyes off the road. However, so-called voice assistants are limited by their language understanding and can often only correctly interpret specific, predefined commands. Variations in voice input that deviate from these predefined commands pose a challenge. Since modern vehicles have a multitude of functions, for example, 1000, selecting the correct function is particularly challenging. Users can express their commands in very different ways, which increases the complexity of correctly classifying and assigning voice inputs to the corresponding vehicle functions.Adding to the complexity is the fact that the application programming interfaces (APIs) supported by a voice assistant can differ between vehicle models or be modified through software updates and adjustments. Furthermore, different third-party applications may be installed in different vehicles. The result is an unmanageable number of API versions, which theoretically could exist in a single manufacturer's vehicle.

[0003] The object of the invention is to provide a method and a system for generating machine learning models for a vehicle's voice assistant, with which voice assistants can be generated particularly easily for a large number of functional interface versions.

[0004] This problem is solved by a method comprising the features of claim 1 and the subject matter of the dependent claims. Further developments are specified in the dependent claims.

[0005] A method for generating machine learning models for a vehicle's voice assistant is proposed. The method involves generating a first trained machine learning model from a machine learning model comprising a language model trained to generate an output corresponding to the semantic meaning of a vehicle function desired by a vehicle occupant, based on user input, and a classification model trained to accept the language model's output as input. Using initial training data and without modifying the language model's parameters, this first trained machine learning model is then generated. This first trained machine learning model is trained to identify, based on the user input, a vehicle function from an initial group of vehicle functions that most likely corresponds to the desired vehicle function.Furthermore, using a second set of training data and without changing the parameters of the language model, a second trained machine learning model is created. This model is trained to identify, based on user input, a vehicle function from a second group of vehicle functions that most likely corresponds to the desired vehicle function. The first set of training data comprises sample user inputs, each labeled with a corresponding vehicle function from the first set of vehicle functions. The second set of training data comprises sample user inputs, each labeled with a corresponding vehicle function from the second set of vehicle functions. The first and second sets of vehicle functions are different. Alternatively or additionally, the vehicle functions in the first set of functions and the vehicle functions in the second set have different function parameters.

[0006] The first and second groups correspond, for example, to different versions of a functional interface. For instance, the second group corresponds to a newer version of the functional interface, and the first group corresponds to an older version. In such an implementation, the first trained machine learning model and the second trained machine learning model are generated sequentially to update, for example, a vehicle's voice assistant to the newer version of the functional interface. The newer version of the functional interface may, for example, provide new vehicle functions. Vehicle functions of the newer version of the functional interface may also have different function parameters compared to the older version.The first and second groups can also be different functional interfaces for different vehicle derivatives. In such an embodiment, the first trained machine learning model and the second trained machine learning model can be generated simultaneously or in any desired order.

[0007] The method is based on a machine learning model with the proposed architecture consisting of a language model and a classification model. Both the language model and the classification model have parameters (also called weights) that can be modified through training. The language model is already trained to generate output based on user input that corresponds to the semantic meaning of the user's input. The classification model can be untrained or it can be an already trained classification model that is further trained within the framework of the proposed method. The method generates at least two trained machine learning models from the machine learning model.

[0008] In trained machine learning models, user input is first processed by the language model. The language model generates output from the user input, characterizing its semantic content in a form processable by the classification model. This output enables the classification model to assign different user inputs, even those related to the same vehicle function, to that same function. For example, a user might want to open a window of the vehicle. The user input could be: "Open the window." However, the vehicle occupant could also use alternative phrases, such as: "Open the window" or "Lower the window." The language model then generates output from each of these user inputs, all of which correspond to the semantic meaning of "Open the window."The classification model then assigns the vehicle function "open window" to all these outputs. This enables a voice assistant that incorporates a trained machine learning model according to the proposed architecture to control vehicle functions even based on complex and variably formulated user input.

[0009] According to the invention, it was discovered that the training of such a machine learning model can be simplified if only the parameters of the classification model are changed, i.e., if only the classification model is trained, but the language model remains unchanged. Since the parameters of the language model are not changed when generating the trained machine learning models, the output of the language model does not differ between the first and second trained machine learning models. When generating the first trained machine learning model, the classification model is trained to assign a vehicle function from the first group to the output of the language model. Similarly, when generating the second trained machine learning model, the classification model is trained to assign a vehicle function from the second group to the output of the language model.This eliminates the considerable computational effort required to train the language model. Therefore, the proposed method makes it particularly easy to create voice assistants, even for a large number of functional interface versions.

[0010] The method also makes it particularly easy to update existing vehicle voice assistants to a newer version of the functional interface. The proposed architecture allows only the classification model to be replaced, so that for an update, only the existing classification model needs to be replaced with one retrained using the newer version of the functional interface. If a trained machine learning model is run in a processing unit of the vehicle itself, the update can even be performed via mobile network, since a classification model of just a few megabytes is sufficient for the proposed architecture. If trained machine learning models are run in a processing unit remote from the vehicle (i.e., a remote processing unit), the update can be performed remotely via mobile network.When operated from the backend, massive storage space can be saved because the various trained machine learning models can share a common language model, which typically requires 1000 times or more storage space than the classification models.

[0011] In one embodiment, the language model is trained using a generic text corpus. This generic text corpus comprises texts that are not limited to a specific subject area. For example, the Toronto Book Corpus or a filtered version of Wikipedia can be used as the generic text corpus. Training with this generic text corpus gives the language model a general understanding of language and a broad knowledge base. This general knowledge is also referred to as world knowledge. This world knowledge enables the language model to generate essentially the same output from different user inputs with the same semantic content, even if it has not been specifically trained on those user inputs.

[0012] In one embodiment, the language model is either a Large Language Model or a Small Language Model. Large Language Models (LLM) and Small Language Models (SLM) are classes of language models that differ primarily in the size of the training dataset used to train them. SLMs can be optimized for a specific task. LLMs are typically trained with a text corpus that can be several hundred gigabytes in size. SLMs are typically trained with a text corpus that is only a few gigabytes in size. LLMs and SLMs also differ in the number of parameters and thus the size of the model itself. An LLM can have one hundred billion parameters; for example, GPT-3 has 175 billion parameters, while an SLM typically has no more than ten billion parameters; for example, BERT has 340 million parameters.By appropriately selecting the model size, low latency can be ensured, and the speech model can also be run on the limited hardware of a vehicle. For example, BERT can be operated with an inference time of 20 ms, which is imperceptible to humans. BERT, or one of its many further developments and successors, such as DistilBERT, ALBERT, roBERTa, ELECTRA, and T5, can be used as the speech model.

[0013] In one embodiment, the classification model comprises at least one neural network. Neural networks are capable of reliably recognizing complex patterns even in high-dimensional datasets. This enables the neural network to correctly assign outputs from the language model to the same vehicle function, even if these outputs refer to the same vehicle function but have such different semantic content that they might not be immediately obvious, for example, "Open the window" and "The air is very bad." Alternatively or additionally, the classification model can include further elements, such as elements of a transformer architecture like one or more attentionheads.

[0014] In one embodiment, the output of the language model is an embedding that corresponds to the semantic meaning of the user input. An embedding is a mathematical representation of the semantic content of the user input, for example, a vector in a high-dimensional vector space. Embeddings of user inputs with similar semantic content will be located closer together in this high-dimensional vector space, the embedding space, than embeddings of user inputs with different semantic content. This allows "similar meaning" to be defined mathematically very easily using regions in the embedding space. This enables the appropriately trained classification model to robustly assign different user inputs, but related to the same vehicle function, to the same vehicle function.

[0015] In one embodiment, the output of the classification model comprises an ordered list that assigns a numerical value to each vehicle function in the first group and the second group, respectively, indicating the probability that this vehicle function corresponds to the desired vehicle function. In such an embodiment, the classification model resolves the output of the language model into a vector whose dimensionality corresponds to the number of controllable vehicle functions. Each entry in this vector corresponds to the probability that one of the vehicle functions is the desired vehicle function. A voice assistant can then, for example, identify the vehicle function with the highest probability and control it. If the desired vehicle function cannot be uniquely identified, an output unit of a vehicle can be controlled, for example, to prompt the vehicle occupant to repeat the user input.

[0016] In one embodiment, the output of the classification model includes unique identifiers of at least two vehicle functions from the first group and the second group, respectively, that most likely correspond to the desired vehicle function. Each identifier is assigned a numerical value indicating the probability that the respective vehicle function corresponds to the desired vehicle function. In this embodiment, the classification model generates a list of the vehicle functions that most likely correspond to the desired vehicle function. A voice assistant can use this list to determine which vehicle function should be activated. If the desired vehicle function can be uniquely identified from the list, it can be activated immediately.In some embodiments, however, an output unit of a vehicle can also be controlled to display the most likely vehicle functions to the vehicle occupant and to allow the selection of one of these vehicle functions.

[0017] According to a further aspect, the invention relates to a system for generating machine learning models for a vehicle's voice assistant. The system comprises a memory element on which a machine learning model, comprising a language model and a classification model, as well as first and second training data, are stored. The language model is trained to generate an output based on user input corresponding to a vehicle function requested by a vehicle occupant, which corresponds to the semantic meaning of the user input. The classification model is configured to accept the output of the language model as input. The first training data includes exemplary user inputs, each labeled with a corresponding vehicle function from a first group of vehicle functions.The second training data set comprises sample user inputs, each labeled with a corresponding vehicle function from a second group of vehicle functions. The first and second groups represent different vehicle functions and / or the vehicle functions of the first group and the vehicle functions of the second group have different function parameters.The system further comprises a processing unit that is trained to generate, using the first training data and without changing the parameters of the language model, a first trained machine learning model from the machine learning model, which is trained to determine, based on user input, a vehicle function from the first group of vehicle functions that most likely corresponds to the desired vehicle function, and, using the second training data and without changing the parameters of the language model, to generate, from the machine learning model a second trained machine learning model, which is trained to determine, based on user input, a vehicle function from the second group of vehicle functions that most likely corresponds to the desired vehicle function, without changing the parameters of the language model.

[0018] The system has the same advantages as the claimed method. In particular, the system can be further developed with features described in this document in connection with the method. Furthermore, the claimed method can be further developed with features described in this document in connection with the system.

[0019] The invention further relates to a device for a vehicle for controlling vehicle functions. The device comprises a storage element on which a machine learning model, generated using the method described in this document, is stored; a receiver module configured to receive spoken user input from the vehicle occupant; a processing module configured to load and execute the machine learning model from the storage element; and a control module configured to select and control a vehicle function that most likely corresponds to the desired vehicle function, based on the output of the machine learning model.

[0020] The device utilizes one of the voice assistants generated using the claimed method and therefore has the same advantages as the claimed method and the claimed system. In particular, the device can be further developed with features described in this document in connection with the method and the system. Furthermore, the claimed method and the claimed system can be further developed with features described in this document in connection with the device.

[0021] In one embodiment, the receiving module is configured to convert the user input into a text format that can be processed by the language model. For example, the receiving module can be configured to generate a text-based list of words (tokens) based on the spoken user input, corresponding to the user input. The language model can be kept particularly simple if the input to the language model is in text form.

[0022] In one embodiment, the processing module is part of the vehicle. For example, the processing module is part of a processing unit within the vehicle, such as a central vehicle computer. Alternatively, the processing module can be implemented, at least partially, by a processing unit located remote from the vehicle, such as a server, or in a cloud computing environment. For example, the language model is executed on a server remote from the vehicle or in a cloud computing environment. In such an embodiment, the language model is preferably stored on a memory element of the processing unit located remote from the vehicle. This allows the use of a language model that might not be able to run on the limited hardware of the vehicle, or only with very high latency.The processing module can also be part of a mobile device that can be installed in the vehicle, for example, a smartphone or a tablet computer belonging to a vehicle occupant. In such an embodiment, the language model is preferably stored on a memory element of the mobile device.

[0023] Exemplary embodiments of the invention are explained in more detail below with reference to the figures. These show: Fig. 1 a schematic representation of a system for generating machine learning models for a vehicle voice assistant according to an exemplary embodiment; Fig. 2 a flowchart of a procedure for generating machine learning models for a vehicle voice assistant according to an exemplary embodiment; Fig. 3 a schematic representation of a device of a vehicle for controlling vehicle functions according to one embodiment; and Fig. 4 a flowchart of a procedure for controlling vehicle functions according to an embodiment.

[0024] Fig. Figure 1 shows a schematic representation of a system 100 for generating machine learning models for a voice assistant of a vehicle 302 according to an exemplary embodiment. The system 100 comprises a memory element 102 and a processing unit 104. In one embodiment, the system 100 is formed by one or more computers, each comprising at least one memory element 102 and at least one processor.

[0025] Memory element 102 stores a machine learning model comprising a language model and a classification model. Both the language model and the classification model have parameters that can be modified through training. The language model is trained to generate an output based on user input corresponding to a vehicle function requested by a vehicle occupant 310. The desired vehicle function is one that the vehicle 302 is intended to execute. The classification model is trained to accept the output of the language model as input. Furthermore, memory element 102 stores initial and secondary training data, which can be used to train the machine learning model.The first training data set comprises sample user inputs, each labeled with a corresponding vehicle function from a first group of vehicle functions. The second training data set comprises sample user inputs, each labeled with a corresponding vehicle function from a second group of vehicle functions.

[0026] The first and second groups have different vehicle functions. Alternatively or additionally, the vehicle functions of the first group and the vehicle functions of the second group have different function parameters. For example, the first and second groups correspond to different versions of a function interface that provide different vehicle functions which can be controlled by voice user input. However, the first and second groups can also be different versions of a function interface for different vehicle derivatives. For example, some vehicle derivatives have a voice-controlled sunroof, which other vehicle derivatives do not.

[0027] Processing unit 104 is trained to generate a first trained machine learning model using the initial training data from the machine learning model, without modifying the parameters of the language model. This first trained machine learning model is trained to identify, based on user input, a vehicle function from the first group of vehicle functions that most likely corresponds to the desired vehicle function. Processing unit 104 is further trained to generate a second trained machine learning model from the machine learning model using the second training data and without modifying the parameters of the language model.The second trained machine learning model is trained to identify, based on user input, a vehicle function from the second group of vehicle functions that most likely corresponds to the desired vehicle function. To generate the first and second trained machine learning models, processing unit 104 executes, for example, the program code of a corresponding computer program stored on memory element 102, another memory element 102, or in processing unit 104 itself. The training of the first and second trained machine learning models is described below. Fig. 2 described in more detail.

[0028] Fig. Figure 2 shows a flowchart of a method for generating machine learning models for a vehicle voice assistant 302 according to an exemplary embodiment. The method is shown only as an example based on the example in Figure 2. Fig. System 100 shown is explained.

[0029] The procedure is started in step S200. In step S202, the machine learning model, which includes the language model and the classification model, is provided. This can be a machine learning model where only the language model is trained, meaning that the parameters of the classification model are, for example, in an initial state. Alternatively, it can be a machine learning model that has already been trained according to the procedure described here or another procedure and is retrained using the procedure described here. Step S202 can be performed, for example, by memory element 102, where the machine learning model is stored and thus provided. Training the language model can be performed as an optional step within the procedure. Alternatively, a previously trained language model can be used.

[0030] In step S204, the first trained machine learning model is created using the first training data. The first training data set is fed into the language model as input to generate an output for each sample user input, which is then fed into the classification model as training input. The parameters of the classification model are varied until its output corresponds to the correct vehicle functions. The parameters of the language model remain unchanged. In step S206, the second trained machine learning model is created using the second training data. Again, only the parameters of the classification model are changed; the parameters of the language model remain unchanged. Steps S204 and S206 can be performed simultaneously or in any order.The first trained machine learning model and the second trained machine learning model can then be made available for use in a vehicle voice assistant 302. The procedure is then completed in step S208.

[0031] In the based on the Fig. 1 and Fig. In the embodiments described in Figure 2, at least the storage element 102 and the processing unit 104 form the system 100 for generating machine learning models for a voice assistant of a vehicle 302. Further embodiments are described in Figure 2. Fig. 1 and Fig. The two elements and features shown and mentioned in this document can be part of System 100. Likewise, process steps described using System 100 can be part of the claimed process.

[0032] Fig. Figure 3 shows a schematic representation of a device 300 of a vehicle 302 for controlling vehicle functions according to one embodiment. Vehicle functions controllable by the device 300 include, for example, opening a window of the vehicle 302, starting route guidance, activating seat heating of the vehicle 102, activating ventilation of the vehicle 102, initiating a call, controlling lighting, controlling an entertainment system of the vehicle 102, controlling a vehicle mode, providing an operating aid, querying the status, and providing a support function of the vehicle 102. The device 300 comprises a receiver module 304, a processing module 306, and a control module 308, which are shown only as examples of parts of the vehicle 302.

[0033] The receiver module 304 is configured to receive spoken user input from a vehicle occupant 310. The user input always corresponds to the vehicle function requested by the vehicle occupant 310. To receive the user input in spoken form, the receiver module 304 can be configured to receive the user input as audio data from a microphone 312, for example, a microphone of the vehicle 302 or a microphone of a mobile device connected to the vehicle 302. From the user input, the receiver module 304 can generate audio data or convert the spoken user input into text and make it available for further processing by the device 300. The receiver module 304 is shown purely as an example of a processing unit 314 of the vehicle 302, for example, a central vehicle computer.

[0034] The processing module 306 is designed to perform the following tasks based on the Fig. 1 and Fig. 2 to operate the machine learning model described. This means that the processing module 306 is configured to load and execute the machine learning model from a memory element, for example, a memory element 316 of the processing unit 314 of the vehicle 302. The processing module 306 is also shown, purely by way of example, as part of the processing unit 314 of the vehicle 302. In other embodiments, however, the processing module 306 can also be formed wholly or partially by a processing unit located remote from the vehicle 302. In particular, the language model can be executed on such a processing unit 104 located remote from the vehicle 302. In such an embodiment, the language model is preferably stored on a memory element of the processing unit located remote from the vehicle 302.

[0035] The control module 308 is designed to control a vehicle function based on the output of the classification model. For example, the control module 308 determines, based on the output of the classification model, which of the vehicle functions has been classified by the classification model as the most probable vehicle function and controls it. If the control module 308 cannot unambiguously determine which vehicle function is the desired vehicle function based on the output of the classification model, for example, if none of the vehicle functions has been clearly classified as the most probable vehicle function, the control module 308 can, for example, control an output unit 318 of the vehicle 302 to prompt the vehicle occupant 310 to repeat the user input.Like the receiver module 304 and the processing module 306, the control module 308 is also shown purely as an example as part of the processing unit 314 of the vehicle 302.

[0036] Fig. Figure 4 shows a flowchart of a method for controlling vehicle functions according to one embodiment. The method can be carried out, for example, using device 300 according to Fig. 3 will be carried out.

[0037] In step S400, the procedure is initiated. In step S402, the spoken user input from vehicle occupant 310 is received, corresponding to a vehicle function desired by vehicle occupant 310. With this user input, vehicle occupant 310 specifies which vehicle function should be activated. For example, vehicle occupant 310 says "open the window" or "make the window open" if they want a window of vehicle 302 to be opened. In another example, vehicle occupant 310 says "Navigate me to Lauchstädter Straße 11 in Munich" if they want to start route guidance. Vehicle occupant 310 could also say "Turn on the seat heating on the driver's seat" to activate the seat heating of a driver's seat, "Activate the ventilation" to activate the ventilation, "Call <person>"to call the person", or "Play the radio station BR-Klassik", "Play music on <drittanbieter-applikation>"or "Let the children play a video game" to control an entertainment system of the vehicle 102. The input can also be a generic input, for example "Confirm the input", or a status query, for example "How can I find charging points along my route?", "Why is a warning light on?", or "Did I leave something in the car?" The user input is received, for example, by the receiver module 304 of the device 300.

[0038] In step S404, using the language model and based on the user input, an output is generated that corresponds to the semantic meaning of the user input. The output is, for example, an embedding or another mathematical representation of the meaning of the user input. Embeddings of user inputs with the same or similar meaning are closer together than embeddings of user inputs with different meanings. This allows "similar meaning" to be defined mathematically. Step S404 is performed, for example, by processing module 306. Training the language model can be performed as an optional step within the procedure. Alternatively, a pre-trained language model can be used.

[0039] In step S406, using the classification model and based on the output of the language model, an output is generated indicating which of the vehicle functions controllable by the control module 308 most likely corresponds to the desired vehicle function. For example, the classification model generates an ordered list of numerical values ​​from the high-dimensional vector in the embedding space, i.e., another vector with a significantly smaller dimension. Thus, the classification model assigns one or more specific vehicle functions, likely to be controlled, to the meaning of the user input determined by the language model. Step S406, like step S404, is performed, for example, by the processing module 306.

[0040] The training of the classification model is performed using the procedure described in this document and can be considered an optional step as part of the process based on Fig. The procedures described in section 4 can be carried out. In particular, the training of the classification model can be performed as part of the process based on… Fig. The procedure can be repeated if, for example, new vehicle functions are available in vehicle 302.

[0041] In step S408, a vehicle function of vehicle 302 is selected and activated based on the output of the classification model. For example, the most probable vehicle function from the classification model output is selected and activated. If, for example, several vehicle functions are equally probable, or if none of the vehicle functions is more probable than a predetermined threshold, such as 50%, the vehicle occupant 310 may also be prompted to repeat their user input. Step S408 is performed, for example, by the control module 308. The procedure is then terminated in step S410.

[0042] In the based on the Fig. 3 and Fig. In the embodiments described in section 4, at least the receiving module 304, the processing module 306, and the control module 308 form the device 300 of a vehicle 302 for controlling vehicle functions. Further embodiments are described in the Fig. 3 and Fig. The four elements and features shown and mentioned in the preceding description can be part of the device 300. Likewise, the process steps described using the device 300 can be part of the method for controlling vehicle functions. Reference symbol list 100 System 102 storage element 104 processing units 300 device 302 Vehicle 304 Receiver module 306 Processing module 308 Control module 310 vehicle occupants 312 Microphone 314 processing units 316 Storage element 318 output units < / person>

Claims

Method for generating machine learning models for a vehicle voice assistant (302), in which a machine learning model comprising a language model trained to generate an output corresponding to the semantic meaning of the user input based on a user input corresponding to a vehicle function desired by a vehicle occupant (310), and a classification model trained to accept the output of the language model as input, using initial training data and without changing the parameters of the language model, generates a first trained machine learning model trained to identify, based on the user input, a vehicle function from a first group of vehicle functions that most likely corresponds to the desired vehicle function.and, using second training data and without changing the parameters of the language model, a second trained machine learning model is generated, which is trained to determine, based on user input, a vehicle function from a second group of vehicle functions that most likely corresponds to the desired vehicle function, wherein the first training data comprises exemplary user inputs, each labeled with a corresponding vehicle function from the first group of vehicle functions, wherein the second training data comprises exemplary user inputs, each labeled with a corresponding vehicle function from the second group of vehicle functions, and wherein the first group and the second group have different vehicle functions and / or the vehicle functions of the first group and the vehicle functions of the second group have different function parameters. Method according to claim 1, wherein the language model has been trained using a generic text corpus. Method according to claim 1 or 2, wherein the language model is a Large Language Model or a Small Language Model. Method according to any of the preceding claims, wherein the classification model comprises at least one neural network. Method according to one of the preceding claims, wherein the output of the language model is an embedding that corresponds to the semantic meaning of the user input. Method according to one of the preceding claims, wherein the output of the classification model comprises an ordered list which includes, for each vehicle function of the first group or the second group, a numerical value indicating the probability with which this vehicle function corresponds to the desired vehicle function. Method according to any one of claims 1 to 5, wherein the output of the classification model comprises unique identifiers of at least two vehicle functions of the first group or the second group that most likely correspond to the desired vehicle function and to which each is assigned a numerical value indicating the probability that the respective vehicle function corresponds to the desired vehicle function. System (100) for generating machine learning models for a vehicle voice assistant (302), comprising a memory element (102) on which a machine learning model, comprising a language model and a classification model, first training data and second training data are stored, wherein the language model is trained to generate an output based on a user input corresponding to a vehicle function requested by a vehicle occupant (310), which corresponds to the semantic meaning of the user input, and the classification model is trained to accept the output of the language model as input, wherein the first training data comprise exemplary user inputs, each labeled with a corresponding vehicle function from a first group of vehicle functions, and the second training data comprise exemplary user inputs,each labelled with a corresponding vehicle function from a second group of vehicle functions, wherein the first group and the second group have different vehicle functions and / or the vehicle functions of the first group and the vehicle functions of the second group have different function parameters; and a processing unit (104) trained to generate a first trained machine learning model from the machine learning model using the first training data and without changing the parameters of the language model, which is trained to determine, based on user input, a vehicle function from the first group of vehicle functions that most likely corresponds to the desired vehicle function, and to generate a second trained machine learning model from the machine learning model using the second training data and without changing the parameters of the language model,which is trained to determine, based on user input, a vehicle function from the second group of vehicle functions that most likely corresponds to the desired vehicle function, without changing the parameters of the language model. Device (300) of a vehicle (302) for controlling vehicle functions, comprising: a storage element (102) on which a machine learning model is stored which has been generated using the method according to one of claims 1 to 7; a receiving module (304) which is configured to receive the user input in spoken form from the vehicle occupant (310); a processing module (306) which is configured to load and execute the machine learning model from the storage element (102); and a control module (308) which is configured to select and control a vehicle function which most likely corresponds to the desired vehicle function based on the output of the machine learning model. Device (300) according to claim 9, wherein the receiving module (304) is configured to convert the spoken user input into a text form that can be processed by the machine learning model.