Electronic device for vehicle, and method of operating same
The vehicle AI system addresses the limitations of existing AI by dynamically selecting models based on speech, context, and vehicle state analysis, ensuring efficient and personalized responses in diverse driving conditions.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SAMSUNG ELECTRONICS CO LTD
- Filing Date
- 2025-11-13
- Publication Date
- 2026-06-25
AI Technical Summary
Existing vehicle AI systems struggle with responding appropriately to various situations such as speech topics, multiple users, sentence lengths, and multilingualism due to a lack of understanding of the vehicle environment and context, leading to inefficiencies and high costs when using large models.
An electronic device for a vehicle that dynamically selects an appropriate model by analyzing user speech, vehicle context, and driving state features to provide personalized and efficient responses, using a combination of speech, context, and vehicle state information to optimize model selection.
The system provides accurate, fast, and cost-effective responses tailored to the vehicle's dynamic environment, supporting multiple users with personalized AI services, enhancing safety and convenience during driving.
Smart Images

Figure KR2025018689_25062026_PF_FP_ABST
Abstract
Description
Vehicle electronic device and method of operation thereof
[0001] The present disclosure relates to an electronic device for a vehicle and a method of operation thereof, which dynamically determines a model to respond to a user's speech based on the speech of occupants in the vehicle and the situational context, and provides an AI service through the determined response model.
[0002] Recently, single models installed in vehicles or connected to networks recognize user utterances and support more natural communication. However, single models have limitations in appropriately responding to various situations—such as speech topics, multiple users, sentence lengths, and multilingualism—or in providing the responses users expect. In particular, a lack of ability to understand the vehicle environment and context makes it difficult to sustain an effective conversation. Furthermore, even though sufficiently appropriate responses could be generated with a small model, the use of unnecessarily large models sometimes leads to inefficiencies and high costs. To address these issues, it is necessary to provide users with more appropriate and efficient responses by selecting and utilizing the appropriate model according to the situation, rather than relying on a single model.
[0003] According to one aspect of the present disclosure, a method of operation for an electronic device for a vehicle may be provided. The method may include the step of acquiring first features corresponding to a plurality of items of speech analysis information based on a user's speech. The method may include the step of acquiring second features corresponding to a plurality of items of context analysis information representing the vehicle interior context based on one or more images taken of the vehicle interior. The method may include the step of acquiring third features corresponding to a plurality of items of vehicle state information representing the vehicle state related to driving based on sensor data from one or more sensors in the vehicle. The method may include the step of selecting a model to respond to the speech from a plurality of models by mapping at least one of the first features, the second features, or the third features to model analysis data. The method may include the step of processing the speech through the selected model to provide a response. The model analysis data may be a set of features corresponding to a plurality of analysis items determined based on the metadata of the plurality of models and the responses of the plurality of models to previous speeches.
[0004] According to one aspect of the present disclosure, an electronic device for a vehicle may be provided. The electronic device for a vehicle may include at least one processor comprising a processing circuit; and a memory for storing instructions. By executing the instructions by the at least one processor, the electronic device for a vehicle may acquire first features corresponding to a plurality of items of speech analysis information based on a user's speech. By executing the instructions by the at least one processor, the electronic device for a vehicle may acquire second features corresponding to a plurality of items of context analysis information representing a vehicle interior context based on one or more images taken of the vehicle interior. By executing the instructions by the at least one processor, the electronic device for a vehicle may acquire third features corresponding to a plurality of items of vehicle state information representing a vehicle state related to driving based on sensor data from one or more sensors within the vehicle. By executing the above instructions by the at least one processor, the vehicle electronic device can map at least one of the first features, the second features, or the third features to model analysis data and select a model among a plurality of models to respond to the utterance. By executing the above instructions by the at least one processor, the vehicle electronic device can process the utterance through the selected model and provide a response. The model analysis data may be a set of features corresponding to a plurality of analysis items determined based on the metadata of the plurality of models and the responses of the plurality of models to previous utterances.
[0005] According to one aspect of the present disclosure, a computer-readable recording medium may be provided having a program recorded thereon for executing any one of the methods described above and below, wherein an electronic device for a vehicle controls the operation of the vehicle.
[0006] FIG. 1 is a drawing for exemplarily illustrating the operation of an electronic device for a vehicle according to one embodiment of the present disclosure.
[0007] FIG. 2 is a flowchart illustrating the operation of an electronic device for a vehicle according to one embodiment of the present disclosure.
[0008] FIG. 3 is a diagram illustrating the process of a vehicle electronic device providing a response according to one embodiment of the present disclosure.
[0009] FIG. 4 is a flowchart illustrating the operation of a vehicle electronic device according to one embodiment of the present disclosure providing a response and analyzing a model.
[0010] FIG. 5a is a diagram illustrating, by way of example, an artificial intelligence model used by a vehicle electronic device according to one embodiment of the present disclosure to perform an analysis task.
[0011] FIG. 5b is a diagram illustrating, by way of example, an artificial intelligence model used by a vehicle electronic device according to one embodiment of the present disclosure to perform an analysis task.
[0012] FIG. 6 is a diagram illustrating ignition analysis information generated by a vehicle electronic device according to one embodiment of the present disclosure.
[0013] FIG. 7 is a diagram illustrating context analysis information generated by a vehicle electronic device according to one embodiment of the present disclosure.
[0014] FIG. 8 is a drawing for explaining vehicle status information obtained by a vehicle electronic device according to one embodiment of the present disclosure.
[0015] FIG. 9 is a diagram illustrating the operation of a vehicle electronic device according to one embodiment of the present disclosure to select an optimal model.
[0016] FIG. 10 is a diagram illustrating model response analysis data generated by a vehicle electronic device according to one embodiment of the present disclosure.
[0017] FIG. 11 is a drawing for explaining model analysis data generated by a vehicle electronic device according to one embodiment of the present disclosure.
[0018] FIG. 12 is a diagram illustrating the operation of a vehicle electronic device according to one embodiment of the present disclosure connecting a situational response model.
[0019] FIG. 13 is a drawing for explaining how an electronic device for a vehicle according to one embodiment of the present disclosure operates in conjunction with a server.
[0020] FIG. 14 is a block diagram illustrating the configuration of an electronic device for a vehicle according to one embodiment of the present disclosure.
[0021] FIG. 15 is a block diagram illustrating the configuration of an electronic device for a vehicle according to one embodiment of the present disclosure.
[0022] The terms used in this specification will be briefly explained, and the present disclosure will be described in detail. In the present disclosure, the expression "at least one of a, b, or c" may refer to "a," "b," "c," "a and b," "a and c," "b and c," "all of a, b, and c," or variations thereof.
[0023] The terms used in this disclosure have been selected to be as widely used and general as possible, taking into account their functions within this disclosure; however, these terms may vary depending on the intent of those skilled in the art, case law, the emergence of new technologies, etc. Additionally, in specific cases, terms have been selected at the applicant's discretion, and in such cases, their meanings will be described in detail in the relevant explanatory sections. Therefore, terms used in this disclosure should be defined not merely by their names, but based on their meanings and the overall content of this disclosure.
[0024] Singular expressions may include plural expressions unless the context clearly indicates otherwise. Terms used herein, including technical or scientific terms, may have the same meaning as generally understood by those skilled in the art as described in this specification. Additionally, terms including ordinal numbers, such as "first" or "second," used in this specification may be used to describe various components, but said components should not be limited by said terms. Such terms are used solely for the purpose of distinguishing one component from another.
[0025] When a part of a specification is described as "comprising" a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components. Furthermore, terms such as "part" or "module" as used in the specification refer to a unit that processes at least one function or operation, and this may be implemented in hardware or software, or as a combination of hardware and software.
[0026] Embodiments of the present disclosure are described below with reference to the attached drawings so that those skilled in the art can easily implement the invention. However, the present disclosure may be embodied in various different forms and is not limited to the embodiments described herein. Furthermore, parts unrelated to the description have been omitted from the drawings to clearly explain the present disclosure. Additionally, for convenience of explanation, different reference numbers have been used throughout the specification even for identical components.
[0027] The present disclosure will be described below with reference to the attached drawings.
[0028] FIG. 1 is a drawing for exemplarily illustrating the operation of an electronic device for a vehicle according to one embodiment of the present disclosure.
[0029] Referring to FIG. 1, a vehicle electronic device (1000) can be installed in a vehicle (100). The vehicle electronic device (1000) can provide an AI service that outputs a response to a utterance by a passenger (user) inside the vehicle (100) based on information obtainable from inside / outside the vehicle.
[0030] The conversational AI service within the vehicle (100) differs from the AI service in general mobile or personal computers. For example, the vehicle (100) has very high mobility, so the conversational AI service must operate in a continuously changing environment (location, speed, network connection, etc.). Since safety is very important for the vehicle (100), stability and reliability of the responses from the conversational AI service are required, and multiple people can use the vehicle (100) simultaneously. Therefore, the conversational AI service in the vehicle (100) requires a personalized response for each user speaking.
[0031] Accordingly, the vehicle electronic device (1000) can mount multiple models within the vehicle (100) or connect multiple models outside the vehicle (100), and can select the most suitable model according to the user's speech and context, and the state of the vehicle (100).
[0032] The vehicle electronic device (1000) can perform speech analysis (110) of passengers in the vehicle (100). The vehicle electronic device (1000) can generate speech analysis information through speech analysis (110) and use the features of the speech analysis information to select a response model.
[0033] The vehicle electronic device (1000) can perform internal vehicle context analysis (120) using an internal image of the vehicle (100). The vehicle electronic device (1000) can generate context analysis information through context analysis (120) and use the features of the context analysis information to select a response model.
[0034] The vehicle electronic device (1000) can obtain vehicle state information (130) from one or more sensors in the vehicle. The vehicle electronic device (1000) can use features of the vehicle state information to select a response model.
[0035] The present disclosure includes data generation and processing processes for providing a model optimized for a user. An electronic device for a vehicle (1000) processes collected data to obtain speech analysis information, context analysis information, and vehicle status information, generates model analysis data for a plurality of models, and performs mapping based on the analysis data to select an optimal model.
[0036] Features of speech analysis information, context analysis information, vehicle status information, and model analysis data are features designed to respond to various situations and user conditions of the vehicle (100). The vehicle electronic device (1000) can contribute to creating a safe and convenient driving environment by providing accurate and fast responses using a model selected through feature mapping. Furthermore, by providing an appropriate model for the situation, it is possible to provide an efficient conversational artificial intelligence service by selecting a low-cost model without the inefficiency of using a high-cost model. Additionally, it is possible to provide personalized model services suitable for multiple users riding in the vehicle (100).
[0037] Specific operations of a vehicle electronic device (1000) dynamically selecting a response model to provide AI services will be described in more detail through the drawings and descriptions thereof.
[0038] FIG. 2 is a flowchart illustrating the operation of an electronic device for a vehicle according to one embodiment of the present disclosure.
[0039] In operation S210, the vehicle electronic device (1000) can acquire first features corresponding to a plurality of items of speech analysis information based on the user's speech.
[0040] In one embodiment, the vehicle electronic device (1000) can acquire a user's speech. The user's speech can be acquired through a microphone in the vehicle, and the vehicle electronic device (1000) may include a microphone, or the vehicle electronic device (1000) and the microphone may be interconnected so as to exchange data.
[0041] A vehicle electronic device (1000) can convert speech into text using a Speech-to-Text (STT) model. The converted text can be used as input data for a speech analysis model. In one embodiment, to use the text as input data for a speech analysis model, a certain preprocessing (e.g., removal of whitespace, removal of noise, etc.) may be performed.
[0042] The electronic device for a vehicle (1000) can analyze a user's speech using a speech analysis model and obtain speech analysis information. The speech analysis model may be an artificial intelligence model for receiving text input and inferring various analysis items related to speech. For example, the speech analysis model may be a language processing model designed based on a Transformer architecture, or through an extension or variation of said architecture, but is not limited thereto. In one embodiment, the speech analysis model may be implemented based on a multimodal model (e.g., a large-scale multimodal model) capable of processing text and audio to obtain features that analyze speech speed, tone of voice, etc. Specific details regarding the speech analysis model will be further described with reference to FIGS. 5a-5b.
[0043] Speech analysis information may include features corresponding to multiple items related to the utterance. Features included in the speech analysis information may include, for example, topic, sentence length, speech rate, tone, recency, urgency, emotion, language, etc., but are not limited thereto. Specific details regarding the speech analysis information will be further described with reference to FIG. 6.
[0044] In operation S220, the vehicle electronic device (1000) can acquire second features corresponding to multiple items of context analysis information representing the vehicle interior context based on one or more images taken of the vehicle interior.
[0045] In one embodiment, the vehicle electronic device (1000) may acquire an image (or video) of the interior of the vehicle. The image of the interior of the vehicle may be acquired by a Driver Monitoring System (DMS) or an Occupant Monitoring System (OMS). The DMS or OMS may include one or more cameras. The DMS or OMS may be included as a component of the vehicle electronic device (1000), or may be interconnected so that the vehicle electronic device (1000) and the DMS or OMS can exchange data.
[0046] The vehicle electronic device (1000) can analyze the internal context of the vehicle and obtain context analysis information using a context analysis model. The context analysis model may be an artificial intelligence model that receives an image as input and infers various analysis items related to the context of scenes and situations within the vehicle. For example, the context analysis model may be an image processing model designed through extension or modification based on a Convolutional Neural Network (CNN) architecture, but is not limited thereto.
[0047] In one embodiment, the context analysis model may be an artificial intelligence model that receives images and audio as input and infers context analysis information. The audio may be sounds inside the vehicle sensed from a microphone inside the vehicle. The context analysis model may be, for example, a Large Multimodal Model (LMM) that integrates and processes various data such as images, audio, and text based on a language model. The LMM may include one or more neural network layers to process input data and extract meaningful features from each modality. For example, the LMM may vectorize image data and extract features using a CNN-based image encoder. Alternatively, for example, the LMM may vectorize audio data and extract features by applying a CNN-based encoder after converting audio data into spectrograms, or by using a Transformer-based audio encoder. Alternatively, for example, the LMM may vectorize text and extract features using a Transformer-based model. Furthermore, the LMM may include one or more neural network layers to fuse features of each modality and learn relationships. The learning process of relationships between features can be implemented through an attention mechanism or a fully connected layer, but is not limited thereto. Additionally, the LMM may include an output layer for outputting the final output feature included in the context analysis information. Specific details regarding the context analysis model will be further described with reference to Figures 5a-5b.
[0048] Context analysis information may include features corresponding to multiple items related to the context inside the vehicle. Features included in the context analysis information may include, for example, passenger information, relationships between passengers, seat occupancy status, scene / situation, recognition of dangerous behavior (e.g., smartphone use, drowsiness, etc.), passenger status (e.g., emotion, gaze, posture, etc.), but are not limited thereto. Specific details regarding the context analysis information will be further described with reference to FIG. 7.
[0049] In operation S230, the vehicle electronic device (1000) can acquire third features corresponding to a plurality of items of vehicle state information representing a vehicle state related to driving, based on sensor data from one or more sensors in the vehicle.
[0050] In one embodiment, the vehicle electronic device (1000) can acquire sensor data from one or more sensors mounted in the vehicle. The one or more sensors mounted in the vehicle may include, for example, a speed sensor (e.g., Vehicle Speed Sensor; VSS), a location sensor (e.g., GPS), a communication-related sensor (e.g., a Bluetooth module, a cellular module (LTE / 5G / 6G), V2X (Vehicle to Everything)), a temperature sensor, a humidity sensor, an illuminance sensor, etc., but are not limited thereto.
[0051] In one embodiment, the vehicle electronic device (1000) can acquire information about applications running while the vehicle is driving. The operating system of the vehicle electronic device (1000) can collect application information (e.g., navigation app, phone app, music app, etc.), execution status (e.g., running, running in the background, standby, etc.), events (e.g., push notification, reception of user input, execution of function, etc.), patterns (e.g., usage time, app type), and usage environment (e.g., Bluetooth connection, cellular network, etc.).
[0052] Vehicle status information may include features corresponding to multiple items related to the driving of the vehicle. Features included in the vehicle status information may include, for example, speed, location, network status, environmental information (temperature, humidity, etc.), navigation information (traffic conditions, destination, route, travel time, etc.), and information on running applications, but are not limited thereto. Specific details regarding the vehicle status information will be further described with reference to FIG. 8.
[0053] In operation S240, the vehicle electronic device (1000) can map at least one of the first features, the second features, or the third features to model analysis data and select a model to respond to a speech from among a plurality of models. The plurality of models may include models stored in the storage of the vehicle electronic device (1000) and operated in an on-device form, and models stored in an external device (e.g., a server) and operated based on cloud computing.
[0054] In one embodiment, the vehicle electronic device (1000) may select a model to respond to a speech from among a plurality of models by using at least one of speech analysis information, context analysis information, or vehicle status information. For example, the vehicle electronic device (1000) may select a model to respond to a speech based solely on speech analysis information. Alternatively, the vehicle electronic device (1000) may select a model to respond to a speech based on all of speech analysis information, context analysis information, and vehicle status information. However, it is not limited thereto, and the model selection may be determined by any one of speech analysis information, context analysis information, and vehicle status information, or a combination thereof. The plurality of models may include, for example, a language model (e.g., a large-scale language model), an image model, and an audio model, but are not limited thereto.
[0055] The vehicle electronic device (1000) can identify the availability status of multiple models. The vehicle electronic device (1000) can identify the availability status of multiple models and select a model to provide a response based on model analysis data of the available models. The model analysis data may include features corresponding to multiple analysis items that are mappable to speech analysis information, context analysis information, and vehicle status information. The features of the model analysis data may be determined based on the metadata of the multiple models and the responses of the multiple models to speech. Specific details regarding the vehicle electronic device (1000) analyzing the responses of the models and generating model analysis data will be further described with reference to FIGS. 10 and FIGS. 11.
[0056] The vehicle electronic device (1000) can independently perform speech analysis, context analysis, and vehicle status identification for each speech input to the vehicle electronic device (1000). For example, speech analysis, context analysis, and vehicle status identification may be performed for a first speech to select a model to respond to the first speech, and speech analysis, context analysis, and vehicle status identification may be performed for a second speech to select a model to respond to the second speech. That is, the optimal response model can be dynamically selected by the vehicle electronic device (1000) for each speech.
[0057] In one embodiment, when the vehicle electronic device (1000) provides a response for each utterance, the vehicle electronic device (1000) may determine whether to perform utterance analysis, context analysis, and vehicle status identification based on whether the user's utterance includes a defined word or sentence. The defined word or sentence is intended to activate the artificial intelligence service of the vehicle electronic device (1000) and may be referred to as a wakeup word or wakeup sentence. The vehicle electronic device (1000) may constantly listen to the voice of users in the vehicle (e.g., driver or passenger) and, when the user utters a defined word or sentence to request a response, decide to perform utterance analysis, context analysis, and vehicle status identification.
[0058] In one embodiment, the vehicle electronic device (1000) may perform only speech analysis while constantly listening to the voices of users in the vehicle. If the speech analysis results identify that the user's speech intent is a request for a response, the vehicle electronic device (1000) may begin to perform context analysis and vehicle status identification. The vehicle electronic device (1000) may select an optimal model to respond to the speech based on at least one of speech analysis information, context analysis information, or vehicle status information.
[0059] In operation S250, the vehicle electronic device (1000) can process the utterance through a selected model and provide a response.
[0060] In one embodiment, the selected model may be a generative artificial intelligence model that processes a user's utterance to generate a response. The generative model may include at least one of a language model that generates text, an image model that generates images, or an audio model that generates audio. Since the selected model is selected based on at least one of utterance analysis information, context analysis information, or vehicle status information, the vehicle electronic device (1000) can obtain a response to the utterance by using a model optimized for processing the user's utterance by reflecting the user's current state. The vehicle electronic device (1000) can input the user's utterance into the selected model and provide the response output from the generative model to the user.
[0061] In one embodiment, the vehicle electronic device (1000) can generate a prompt to be input into a generative model. For example, the vehicle electronic device (1000) can generate a prompt to be input into a generative model based on speech analysis information, context analysis information, or vehicle status information. Specifically, the vehicle electronic device (1000) can select at least some of the features included in the speech analysis information (first features), the features included in the context analysis information (second features), and the features included in the vehicle status information (third features). The vehicle electronic device (1000) can generate a prompt that includes at least some of the features included in the speech analysis information, context analysis information, or vehicle status information, along with the original speech text of the user. The prompt may include the original speech text and the values of the first features, the second features, and the third features, or may be a newly processed sentence based on the original speech text and the values of the first features, the second features, and the third features. Language models may be used to generate input prompts, but are not limited thereto.
[0062] In one embodiment, a response output from a selected model may be output through a speaker or a display and provided to a user. In this case, at least one of a display or a speaker may be included in the vehicle electronic device (1000), or the vehicle electronic device (1000) and at least one of a display or a speaker may be interconnected so as to exchange data.
[0063] In one embodiment, the response output from the selected model may include commands for controlling the vehicle. For example, it may include commands for driving control of the vehicle (acceleration, deceleration, starting, stopping, steering, etc.) or function control of the vehicle (e.g., volume control, climate control, lighting control, seat position control, etc.) or application control using the vehicle electronic device (1000) (e.g., navigation app, phone app, music app, etc.). The vehicle electronic device (1000) may process the commands directly or transmit the commands to a separate control device for controlling the vehicle so that the functions are performed.
[0064] FIG. 3 is a diagram illustrating the process of a vehicle electronic device providing a response according to one embodiment of the present disclosure.
[0065] Referring to FIG. 3, the vehicle electronic device (1000) may include at least a speech analysis module (310), a context analysis module (320), a vehicle status information management module (330), a model analysis module (340), a model connection module (350), and a real-time model status check module (360).
[0066] Each module illustrated in FIG. 3 may be composed of a combination of a series of codes and data to implement a specific function. Each module may process operations corresponding to some of a series of tasks for the vehicle electronic device (1000) to analyze user speech and data related to the vehicle and to select a model to respond to the speech. Each module may interact with one another by transmitting data to one another. At least one processor included in the vehicle electronic device (1000) may process one or more tasks to enable the function of each module to be performed by executing instructions of the code constituting each module. The function of each module is described below.
[0067] The speech analysis module (310) can receive an audio signal containing a user's speech and generate speech analysis information including features related to the speech. The speech analysis module (310) may include, but is not limited to, an STT model that converts speech into text, a text preprocessing module, and a speech analysis model that infers features of the speech analysis information. The speech analysis information can be transmitted to a model analysis module (340) and a model linkage module (350).
[0068] The context analysis module (320) can receive a video signal (or image data) captured inside the vehicle and generate context analysis information including features related to the context inside the vehicle. In one embodiment, the context analysis module can additionally receive an audio signal that senses sounds occurring inside the vehicle, including speech and conversations between users, to generate context analysis information. The context analysis module (320) may include an audio preprocessing module, an image (or video) preprocessing module, and a context analysis model that infers features of the context analysis information, but is not limited thereto. The context analysis information can be transmitted to the model analysis module (340) and the model connection module (350).
[0069] The vehicle state information management module (330) can collect and manage vehicle signals. The vehicle state information management module (330) can collect and manage sensor data sensed from one or more sensors during the driving of the vehicle and data related to one or more applications running on the vehicle electronic device (1000). The vehicle state information management module (330) can update features of the vehicle state information in real time. Alternatively, the vehicle state information management module (330) can collect and store features of the vehicle state information periodically or non-periodically. The vehicle state information can be transmitted to the model analysis module (340) and the model connection module (350).
[0070] The model analysis module (340) can analyze multiple models capable of providing responses to user utterances and generate model analysis data by filling in features corresponding to analysis items for each of the multiple models. The model analysis module (340) can determine the features of the analysis items for each model based on the metadata of the multiple models and the responses of the multiple models to utterances. In this case, for each response, the utterance analysis information, context analysis information, and vehicle status information used when the response is provided can be used together during model analysis. For example, the model analysis module (340) can generate model analysis data corresponding to Model A based on the metadata of Model A and the responses of Model A to utterances, and generate model analysis data corresponding to Model B based on the metadata of Model B and the responses of Model B to utterances. The model analysis data can be transmitted to the model connection module (350) and the real-time model status check module (360). The model analysis module (340) can identify available models from the real-time model status check module (360) and load model analysis data corresponding to the available models. The model analysis module (340) can transmit only the model analysis data corresponding to the available models to the model connection module (350).
[0071] The model connection module (350) can dynamically determine a model to respond to a speech. The model connection module (350) can map features of speech analysis information, context analysis information, and vehicle status information to model analysis data. The model analysis data may be model analysis data corresponding to available models among multiple models. By mapping features of speech analysis information, context analysis information, and vehicle status information to features of model analysis data, the model connection module (350) can determine and connect an optimal model suitable for various variable situations within the vehicle, such as speech length, urgency, language, topic, driving mode, user profile, traffic conditions, and passenger information. Once the optimal model is connected, the vehicle electronic device (1000) can process the speech through the connected model so that a response is provided to the user.
[0072] The real-time model status check module (360) can check the availability of multiple models based on the identification of a user's utterance. The multiple models may be generative artificial intelligence models including language models, image models, or audio models. The real-time model status check module (360) checks the response availability status and response speed of the models and can transmit information about the available models to the model analysis module (340) and the model connection module (350). The model analysis module (340) identifies model analysis data of the available models and transmits it to the model connection module (350), and the model connection module (350) can finally determine the response model based on information about the available models, model analysis data, utterance analysis information, context analysis information, and vehicle status information. Once the response model is determined, the vehicle electronic device (1000) can process the utterance using the response model and obtain the response output from the response model and provide it to the user.
[0073] Meanwhile, the modules illustrated in FIG. 3 are for convenience of explanation and are not necessarily limited thereto. Another module may be added for the vehicle electronic device (1000) to analyze data and select a model, and some modules may be omitted. Additionally, one module may be divided into multiple modules distinguished by detailed functions, and some of the aforementioned modules may be combined to be implemented as a single module.
[0074] FIG. 4 is a flowchart illustrating the operation of a vehicle electronic device according to one embodiment of the present disclosure providing a response and analyzing a model.
[0075] In operation S410, the vehicle electronic device (1000) may perform a data processing operation. The data processing operation may include processes for collecting and analyzing data. The data processing operation may include speech analysis, context analysis, and vehicle status identification. Operation S410 may be triggered based on inputting a user's speech. For example, the data processing operation may be triggered based on the identification of the user's speech itself, or based on the inclusion of a defined word or sentence in the user's speech. When operation S410 is initiated, the speech analysis, context analysis, and vehicle status identification operations may be performed in parallel. In one embodiment, only speech analysis is performed first, and if the speech analysis result identifies the user's speech intent as requesting a response, the context analysis and vehicle status identification operations may be performed.
[0076] In operation S420, the vehicle electronic device (1000) can identify available models. The vehicle electronic device (1000) can examine the response availability status and response speed of multiple models and obtain information about the available models. For example, the availability status of the models can be identified by checking whether model files of on-device models stored in the storage of the vehicle electronic device (1000) exist, whether computing resources are available, etc. Additionally, for example, the vehicle electronic device (1000) can identify the availability status of the models by checking the network connection status and server load status of cloud-based models.
[0077] The vehicle electronic device (1000) can identify whether model analysis data corresponding to available models exists based on information of available models. The model analysis data may be stored in the vehicle electronic device (1000) through an analysis operation based on the metadata of the models and the responses of the plurality of models to previous utterances.
[0078] In operation S430, the vehicle electronic device (1000) can dynamically determine a response model based on at least one of speech analysis information, context analysis information, or vehicle status information, and model analysis data of available models. For example, a response model suitable for a short speech or a long speech may be selected based on the speech length, which is a feature included in the speech analysis information, and the model analysis data. Alternatively, a response model possessing expertise suitable for the topic of the speech may be selected based on the topic, which is a feature included in the speech analysis information, and the model analysis data. Meanwhile, if model analysis data corresponding to available models does not exist, the vehicle electronic device (1000) can connect a basic model through operation S435. In this case, as the user continues to use the vehicle electronic device (1000), data accumulates and model analysis data is generated, and an optimal model can be connected by operation S430.
[0079] In operation S440, the vehicle electronic device (1000) can output a response. If the model to respond to the utterance is an on-device model stored within the vehicle electronic device (1000), the vehicle electronic device (1000) can process the utterance using the connected model and output a response. If the model to respond to the utterance is a cloud-based model, the vehicle electronic device (1000) can transmit the utterance to an external device (e.g., a server) so that the utterance is processed by the external device. The vehicle electronic device (1000) can receive and output a response from the external device. Meanwhile, when the vehicle electronic device (1000) processes the utterance using the model, at least one of utterance analysis information, context analysis information, or vehicle status information may be used as additional input data.
[0080] In operation S450, the vehicle electronic device (1000) may start model analysis. Model analysis may include the vehicle electronic device (1000) using multiple models to output a response for each. In this case, the multiple models for which model analysis is performed are not limited to the available models identified in operation S420. Since a model that was not available in operation S420 may be available at another time, the vehicle electronic device (1000) may perform model analysis on all models. For model analysis, the vehicle electronic device (1000) may collect responses from each model based on speech analysis information, context analysis information, vehicle status information, and speech obtained in the data processing operation (S410).
[0081] Meanwhile, the process of obtaining responses for multiple models may require a significant amount of computation and time. Therefore, the vehicle electronic device (1000) may start model analysis based on the vehicle electronic device (1000) entering an idle state. The idle state of the vehicle electronic device (1000) may mean a state in which the computing resources used by the vehicle electronic device (1000) are below a defined value, and may include cases where the vehicle is in motion, the engine is turned off, or it is in a standby state.
[0082] In operation S460, the vehicle electronic device (1000) can analyze responses by model. The vehicle electronic device (1000) can acquire metadata for each model and acquire model response analysis data based on utterances and utterance analysis information corresponding to the utterances, context analysis information and vehicle status information, and responses to the utterances. For example, the vehicle electronic device (1000) can acquire response analysis data that analyzes the response of Model A and response analysis data that analyzes the response of Model B. The response analysis data may include features corresponding to the response analysis items. The features of the response analysis items may include, for example, model name, model size, response speed, multi-language support status, training data date, model response, response amount, response accuracy, hyperparameter setting value, etc., but are not limited thereto.
[0083] FIG. 5a is a diagram illustrating, by way of example, an artificial intelligence model used by a vehicle electronic device according to one embodiment of the present disclosure to perform an analysis task.
[0084] In describing the artificial intelligence models of Fig. 5a, detailed descriptions of known technologies related to artificial intelligence will be omitted.
[0085] In one embodiment, the vehicle electronic device (1000) may use artificial intelligence models to perform analysis tasks. For example, the vehicle electronic device (1000) may use a speech analysis model (510) and a context analysis model (520). The speech analysis model (510) and the context analysis model may be stored in the vehicle electronic device (1000) after training is completed and performance verification is performed, so that inference tasks can be performed.
[0086] The electronic device for a vehicle (1000) can analyze a user's speech using a speech analysis model (510). Speech analysis features obtained using the speech analysis model (510) may include, for example, topic, sentence length, speech speed, tone of voice, recency, urgency, emotion, language, etc., but are not limited thereto.
[0087] In one embodiment, the speech analysis model (510) may be an artificial intelligence model that receives text converted from a speech and outputs speech analysis information. For example, the speech analysis model may be a language processing model designed based on a Transformer architecture, or through an extension or variation of said architecture, and may be, for example, a Large Language Model (LLM), but is not limited thereto.
[0088] The input layer of the utterance analysis model (510) can perform preprocessing (e.g., normalization) to extract features from input data. The main network of the utterance analysis model (510) can extract common shared features based on input utterances. The utterance analysis model (510) may include task-specific heads that learn the characteristics of each feature to infer each feature of the utterance analysis information. The output layer of the utterance analysis model (510) may include sub-layers for inferring different output features, and each sub-layer may be trained to perform a predetermined inference task based on features received from the previous layer.
[0089] The utterance analysis model (510) can be trained using training data consisting of a collection of texts in which correct values are labeled on texts representing utterances. For example, correct values for features such as 'topic, sentence length, tone, recency, urgency, emotion, language' may be labeled on the texts representing utterances.
[0090] The vehicle electronic device (1000) can prepare for speech analysis work to be performed by training a speech analysis model (510) using training data. Once the training of the speech analysis model (510) is completed, the vehicle electronic device (1000) can obtain speech analysis information using the speech analysis model (510). In one embodiment, the vehicle electronic device (1000) can calculate the speech length (sec) of the speech from audio representing the speech. The vehicle electronic device (1000) can calculate the speech rate based on the speech length and speech sentence (e.g., sentence length, number of words, etc.). The speech rate may be included as a feature value of the speech analysis information.
[0091] The electronic device for a vehicle (1000) can analyze the internal context of the vehicle using a context analysis model (520). The context analysis features obtained using the context analysis model (520) may include, for example, passenger information, relationships between passengers, seat occupancy status, scene / situation, risk behavior recognition, passenger status, etc., but are not limited thereto.
[0092] In one embodiment, the context analysis model (520) may be an artificial intelligence model that receives an image (or video) inside a vehicle as input and outputs context analysis information. For example, the context analysis model may be an image processing model designed based on a CNN architecture, or through an extension or variation of said architecture, but is not limited thereto.
[0093] The input layer of the context analysis model (520) can perform preprocessing (e.g., resizing, normalization, etc.) to extract features from the input data. The main network of the context analysis model (520) can extract common shared features based on the input vehicle interior image. The context analysis model (520) may include a task-specific head that learns the characteristics of each feature to infer each feature of the context analysis information. The output layer of the context analysis model (520) may include sub-layers that infer each different output feature, and each sub-layer may be trained to perform a predetermined inference task based on the feature received from the previous layer.
[0094] The context analysis model (520) can be trained using training data consisting of a collection of images in which correct values are labeled on images of the interior of a vehicle. For example, correct values for features such as 'passenger information, relationships between passengers, seat occupancy status, scene / situation, dangerous behavior recognition, passenger status' may be labeled on the images of the interior of the vehicle.
[0095] The vehicle electronic device (1000) can be prepared to perform a context analysis task by training a context analysis model (520) using training data. Once the training of the context analysis model (520) is completed, the vehicle electronic device (1000) can obtain context analysis information using the context analysis model (520).
[0096] FIG. 5b is a diagram illustrating, by way of example, an artificial intelligence model used by a vehicle electronic device according to one embodiment of the present disclosure to perform an analysis task.
[0097] In describing the artificial intelligence models of Fig. 5b, details regarding known technologies related to artificial intelligence will be omitted.
[0098] In one embodiment, the vehicle electronic device (1000) may use a multimodal model capable of integrating and processing different types of input data. For example, the speech analysis model (510) and context analysis model (520) used by the vehicle electronic device (1000) may be multimodal models.
[0099] In one embodiment, the speech analysis model (510) may be a Large Multimodal Model (LMM) capable of receiving text and audio inputs, processing them integrally, and outputting speech analysis information. Text may refer to text converted from speech, and audio may refer to audio representing speech.
[0100] The speech analysis model (510) may include one or more neural network layers for audio processing and text processing. For example, the speech analysis model (510) may include an encoder. The speech analysis model (510) may convert audio data into a spectrogram and then apply a CNN-based encoder, or vectorize audio data and extract features using a transformer-based audio encoder. Alternatively, for example, the speech analysis model (510) may vectorize text and extract features using a transformer-based text encoder. Alternatively, for example, the encoder of the speech analysis model (510) may be a universal encoder capable of vectorizing audio and text.
[0101] The speech analysis model (510) may include a multimodal fusion module that enables integrated processing by fusing features of each modality. The multimodal fusion module may include one or more neural network layers for fusing features of each modality.
[0102] The utterance analysis model (510) may include an inference module that learns the relationships between features of each modality and generates meaningful outputs. The process of learning the relationships between features may be implemented through an attention mechanism, and for this purpose, a transformer-based large-scale language model (LLM) architecture may be used, but is not limited thereto. The utterance analysis model (510) may also include an output generator for outputting final output features included in the utterance analysis information. The output generator of the utterance analysis model (510) may be responsible for inferring each different output feature of the utterance analysis information.
[0103] The speech analysis model (510) can be trained using training data consisting of a text-audio collection in which correct values are labeled to text-audio pairs representing speech. For example, the training data may have correct values for features such as 'topic, sentence length, speech rate, tone, recency, urgency, emotion, language' labeled to text-audio pairs representing speech. The vehicle electronic device (1000) can prepare for speech analysis to be performed by training the speech analysis model (510) using the training data.
[0104] In one embodiment, the context analysis model (520) may be a Large Multimodal Model (LMM) capable of receiving an image (or video) and audio as input, processing them integrally, and outputting context analysis information. The image may refer to an image taken of the interior of a vehicle, and the audio may refer to speech by users inside the vehicle and sounds occurring inside the vehicle.
[0105] The context analysis model (520) may include one or more neural network layers for image processing and audio processing. For example, the context analysis model (520) may include an encoder. The context analysis model (520) may vectorize image data and extract features using a CNN-based image encoder. Or, for example, the context analysis model (520) may convert audio data into spectrograms and then apply a CNN-based encoder, or vectorize audio data and extract features using a transformer-based audio encoder. Or, for example, the context analysis model (520) may convert audio utterances into text and then vectorize text and extract features using a transformer-based text. Or, for example, the encoder of the context analysis model (520) may be a universal encoder capable of vectorizing images, audio, and text.
[0106] The context analysis model (520) may include a multimodal fusion module that enables integrated processing by fusing features of each modality. The multimodal fusion module may include one or more neural network layers for fusing features of each modality.
[0107] The context analysis model (520) may include an inference module that learns the relationships between features of each modality and generates meaningful output. The process of learning the relationships between features may be implemented through an attention mechanism, and for this purpose, a transformer-based large-scale language model (LLM) architecture may be used, but is not limited thereto. The context analysis model (520) may also include an output generator for outputting final output features included in the context analysis information. The output generator of the context analysis model (520) may be responsible for inferring each different output feature of the utterance analysis information.
[0108] The context analysis model (520) can be trained using training data consisting of an image-audio set in which correct values are labeled for the vehicle interior image-audio pairs. For example, the training data may have correct values labeled for features such as 'occupant information, relationship between occupants, seat occupancy status, scene / situation, dangerous behavior recognition, occupant status' for the vehicle interior image-audio pairs. The vehicle electronic device (1000) can prepare for the context analysis task to be performed by training the context analysis model (520) using the training data.
[0109] FIG. 6 is a diagram illustrating ignition analysis information generated by a vehicle electronic device according to one embodiment of the present disclosure.
[0110] In one embodiment, the vehicle electronic device (1000) can generate speech analysis information (620) based on the user's speech (610). The speech analysis information (620) can be obtained using a speech analysis model. Since the description of the speech analysis model has been provided above, a repetitive description is omitted for brevity.
[0111] Speech analysis information (620) may include features corresponding to multiple items related to speech. Features included in the speech analysis information (620) may include, for example, topic, sentence length, speech rate, tone of voice, recency, urgency, emotion, language, etc., but are not limited thereto.
[0112] For example, the vehicle electronic device (1000) can perform speech analysis on the user's speech (610) "Play music." The vehicle electronic device (1000) can perform speech analysis on "Play music" to obtain feature values of features of speech analysis information. Specific examples of feature-feature values of speech analysis information (620) for "Play music" may be such as Topic-Entertainment, Sentence Length-Concise, Speech Speed-Normal, Tone-Command, Newness-Low, Urgency-Medium, Emotion-Positive, Language-Korean.
[0113] According to the values of the features of the utterance analysis information (620), the vehicle electronic device (1000) can select the optimal model to respond to the utterance (610). The vehicle electronic device (1000) can determine the optimal model to provide a response to the user by mapping the features of the utterance analysis information (620) to the features of the model analysis data.
[0114] For example, regarding 'topic,' a model suitable for providing a response to the topic may be selected, and in a similar manner, each feature of the utterance analysis information (620) may be mapped to model analysis data to select a response model suitable for the user's utterance (610). Specifically, 'latestness' refers to whether the user's utterance (610) contains content regarding a recent issue, and accordingly, whether data containing the latest knowledge has been learned may be used to determine the response model, and 'urgency' refers to whether the user's utterance (610) requires an urgent response, and accordingly, whether a fast and concise response is provided may be used to determine the response model.
[0115] FIG. 7 is a diagram illustrating context analysis information generated by a vehicle electronic device according to one embodiment of the present disclosure.
[0116] In one embodiment, a vehicle electronic device (1000) may obtain context analysis information (720) based on at least one of a vehicle interior image (710) (or video) and a vehicle interior sound (712). The context analysis information (720) may be obtained using a context analysis model. Since a description of the context analysis model has been provided above, a repetitive description is omitted for brevity. The vehicle interior sound (712) may include speech of users inside the vehicle and other sounds occurring inside the vehicle.
[0117] Context analysis information (720) may include features corresponding to multiple items related to the context inside the vehicle. Features included in the context analysis information may include, for example, passenger information, relationships between passengers, seat occupancy status, scene / situation, risk behavior recognition, passenger status, etc., but are not limited thereto. In FIG. 7, specific feature values of the context analysis information (720) are omitted.
[0118] For example, the feature 'Passenger Information' of the context analysis information (720) may have feature values related to users inside the vehicle, such as the driver, Passenger A, and Passenger B. The 'Relationships Between Passengers' may have feature values related to users inside the vehicle, such as family, lovers, friends, and business relationships. The 'Seat Occupancy Status' may have feature values related to the seats where passengers are located inside the vehicle, such as the driver's seat, passenger seat, and rear seat. The 'Scene / Situation' may have feature values related to situations inside the vehicle and interactions between passengers, such as general conversation, conflict, and handing over objects. The 'Dangerous Behavior Recognition' may have feature values related to threatening passenger behaviors detected inside the vehicle, such as smartphone use, eye contact (distraction), drowsiness, and movement within the seat. The Passenger Status may have feature values related to the individual status of each passenger, such as face direction, posture, seatbelt fastening status, and emotion.
[0119] According to the values of the features of the above context analysis information (720), the vehicle electronic device (1000) can select the optimal model to respond to the user's speech. The vehicle electronic device (1000) can determine the optimal model to provide a response to the user by mapping the features of the context analysis information (720) to the features of the model analysis data.
[0120] FIG. 8 is a drawing for explaining vehicle status information obtained by a vehicle electronic device according to one embodiment of the present disclosure.
[0121] In one embodiment, the vehicle electronic device (1000) can acquire vehicle status information (820). The vehicle status information (820) may include features corresponding to a plurality of items related to the driving of the vehicle.
[0122] Vehicle status information (820) may include application information. The vehicle electronic device (1000) may obtain application information from one or more applications (810) running while the vehicle is driving. One or more applications (810) may include, for example, a navigation app, a phone app, a music app, etc., but are not limited thereto. Application information may include execution status (e.g., running, running in the background, standby, etc.), events (e.g., push notification, reception of user input, execution of function, etc.), patterns (e.g., usage time, app type), and usage environment (e.g., Bluetooth connection, cellular network, etc.), but is not limited thereto.
[0123] Vehicle status information (820) may include sensor information. The vehicle electronic device (1000) may obtain sensor data from one or more sensors (812) mounted in the vehicle. One or more sensors mounted in the vehicle may include, for example, a speed sensor (e.g., Vehicle Speed Sensor; VSS), a location sensor (e.g., GPS), a communication-related sensor (e.g., a Bluetooth module, a cellular module (LTE / 5G / 6G), V2X (Vehicle to Everything)), a temperature sensor, a humidity sensor, an illuminance sensor, etc., but are not limited thereto.
[0124] Features included in the vehicle status information (820) may include, for example, the vehicle's speed, location, network status, environment information, navigation information, application information, etc., but are not limited thereto. In FIG. 8, specific feature values of the vehicle status information (820) are omitted.
[0125] For example, the 'speed' feature of the vehicle status information (820) may have information related to the vehicle's speed, such as the current speed and the average speed per section, as a feature value. The 'location' may have information related to the vehicle's location, such as the vehicle's current location, starting point, and destination, as a feature value. The 'network status' may have information related to the network, such as the network type, connection status, and network speed, as a feature value. The 'environment information' may have information related to the vehicle's internal or external environment, such as the temperature, humidity, light intensity, and external weather, as a feature value. The 'navigation information' may have information related to the vehicle's driving route, such as the starting point, current location, destination, route, traffic conditions, and travel time, as a feature value. The application information may have information related to the actions that the vehicle electronic device (1000) can perform using the application, such as the type of application currently running, the type of executable application, and the function of the application, as a feature value.
[0126] According to the values of the features of the vehicle status information (820), the vehicle electronic device (1000) can select the optimal model to respond to the user's speech. The vehicle electronic device (1000) can determine the optimal model to provide a response to the user by mapping the features of the vehicle status information (820) to the features of the model analysis data.
[0127] Meanwhile, the vehicle electronic device (1000) may use a combination of features described in FIGS. 6 to 8 to determine a response model. For example, the vehicle electronic device (1000) may determine a model to respond to a user's speech by comprehensively mapping features of speech analysis information (620) of FIG. 6, context analysis information (720) of FIG. 7, or vehicle status information (820) of FIG. 8 to model analysis data.
[0128] FIG. 9 is a diagram illustrating the operation of a vehicle electronic device according to one embodiment of the present disclosure to select an optimal model.
[0129] In one embodiment, the vehicle electronic device (1000) may obtain a model list (910) comprising a plurality of models trained to provide a response to a user. For example, the vehicle electronic device (1000) may load the model list (910) and identify that Model A, Model B, ..., Model N are response models. The models in the model list (910) may include models stored in the storage of the vehicle electronic device (1000) and operated in an on-device manner, and models stored in an external device (e.g., a server) and operated based on cloud computing.
[0130] The vehicle electronic device (1000) can identify whether the models are available for use with respect to the models in the model list (910). The vehicle electronic device (1000) can check the response capability and response speed of multiple models and obtain information about the available models. For example, among Model A, Model B, ..., Model N, it can be identified that {Model A, Model C, Model N} are currently available models.
[0131] The vehicle electronic device (1000) can identify whether model analysis data (920) exists for models in the model list (910). The model analysis data (920) may be obtained by analyzing the model's metadata, speech analysis information used in the model's response, context analysis information, vehicle status information, and the model's response. The vehicle electronic device (1000) can continuously generate and update the model analysis data (920) as the vehicle electronic device (1000) is used. However, for reasons such as a model being newly added to the model list (910) or insufficient data required for analysis, some models in the model list (910) may not have model analysis data (920). (In such cases, as data accumulates as the vehicle electronic device (1000) is used, the vehicle electronic device (1000) can perform model analysis and model response analysis so that model analysis data (920) can be generated even for models that do not have model analysis data (920).) For example, the vehicle electronic device (1000) can identify whether model analysis data (920) corresponding to available models {Model A, Model C, Model N} exists. In the example of FIG. 9, an example is illustrated in which model analysis data (920) exists for Model A and Model C, and model analysis data (920) does not exist for Model N.
[0132] The vehicle electronic device (1000) can select a model to provide a response based on model analysis data (920) of available models, speech analysis information, context analysis information, and vehicle status information. For example, the vehicle electronic device (1000) can select Model A as the model to be used for the response among Model A and Model C, for which model analysis data (920) exists. The vehicle electronic device (1000) can map the features of the model analysis data (920) of Model A to the features included in the speech analysis information, context analysis information, and vehicle status information, and map the features of the model analysis data (920) of Model C to the features included in the speech analysis information, context analysis information, and vehicle status information. The vehicle electronic device (1000) can map the speech analysis information, context analysis information, and vehicle status information to each model analysis data (920), and determine a model to respond to a user's speech among models having different characteristics based on the degree of matching with the feature values.
[0133] FIG. 10 is a diagram illustrating model response analysis data generated by a vehicle electronic device according to one embodiment of the present disclosure.
[0134] In one embodiment, the vehicle electronic device (1000) may acquire metadata of a plurality of models. Model metadata includes information describing the characteristics, performance, etc. of each model and may be provided by an operating entity that distributes and manages the model. Model metadata may include, for example, a model name, model size (e.g., number of parameters), latency, throughput, accuracy, multilingual support, a training data set, a training data date, etc., but is not limited thereto.
[0135] In one embodiment, the vehicle electronic device (1000) can generate model response analysis data by analyzing the responses of a plurality of models to an utterance. For example, since each of the plurality of models outputs a different response to the same utterance, the vehicle electronic device (1000) can analyze the model-specific responses of the plurality of models.
[0136] The vehicle electronic device (1000) can acquire metadata for each model and acquire model response analysis data based on a speech, speech analysis information corresponding to the speech, context analysis information and vehicle status information, and a response to the speech. For example, the vehicle electronic device (1000) can acquire metadata for Model A and acquire a response from Model A by inputting a speech, speech analysis information corresponding to the speech, context analysis information and vehicle status information corresponding to the speech, into Model A. Then, the vehicle electronic device (1000) can acquire metadata for Model B and acquire a response from Model B by inputting the same input that was input into Model A (for example, a speech, speech analysis information corresponding to the speech, context analysis information and vehicle status information corresponding to the speech) into Model B.
[0137] The vehicle electronic device (1000) can obtain the response of each model from a plurality of models and generate response analysis data. The response analysis data may include features corresponding to response analysis items. The features of the response analysis items may include, for example, model name, model size, response speed, multi-language support status, training data date, model response, amount of response, response accuracy, hyperparameter setting value, etc., but are not limited thereto. The vehicle electronic device (1000) can fill in the feature values of the features of the response analysis data based on the response and metadata of each model.
[0138] In one embodiment, when the vehicle electronic device (1000) enters an idle state, it may input speech and additional information (e.g., speech analysis information, context analysis information, vehicle status information) into a plurality of models and obtain responses to start response analysis. The vehicle electronic device (1000) may update model analysis data based on response analysis data.
[0139] In one embodiment, when generating model response analysis data, the vehicle electronic device (1000) can generate multiple responses by changing and setting hyperparameters that adjust the output of the model. For example, the vehicle electronic device (1000) can generate responses and generate response analysis data by changing the values of hyperparameters (e.g., Temperature, Top-K, Top-P, Repeat Penalty, etc.) used to adjust the quality and characteristics of the model output. For example, the hyperparameter 'Temperature' adjusts the randomness of the generated response; the lower the value, the more predictable and decisive the response is output, and the higher the value, the more random and creative the response is output.
[0140] Since response analysis data includes hyperparameter settings, the model analysis data generated using the response analysis data also includes hyperparameter settings. As hyperparameter settings can be applied differently for each utterance-response, different hyperparameter values may be applied even to the same model.
[0141] When selecting a model to respond to, the vehicle electronic device (1000) can obtain a hyperparameter setting value mapped within the model analysis data based on at least one of speech analysis information, context analysis information, or vehicle state information, and dynamically set a hyperparameter that adjusts the output of the selected model.
[0142] In one embodiment, when the vehicle electronic device (1000) generates model response analysis data, it may obtain user input indicating feedback on responses output from a plurality of models. For example, when the vehicle electronic device (1000) generates a response of model A, it may generate a plurality of responses by applying weights so that any one of speech analysis information, context analysis information, or vehicle status information is further reflected. For example, the vehicle electronic device (1000) may obtain, from the response obtained from model A, a first response with weights applied to speech analysis information, a second response with weights applied to context analysis information, a third response with weights applied to vehicle status information, etc. The vehicle electronic device (1000) may display the plurality of responses to the user and obtain feedback that the user selects which response they prefer. Based on the feedback from the user, the vehicle electronic device (1000) may determine at least one of a first weight corresponding to speech analysis information, a second weight corresponding to context analysis information, or a third weight corresponding to vehicle status information, and ensure that the determined weight value is included in the response analysis data. The value of the weight can be adjusted according to the strength of the feedback.
[0143] FIG. 11 is a drawing for explaining model analysis data generated by a vehicle electronic device according to one embodiment of the present disclosure.
[0144] In one embodiment, the vehicle electronic device (1000) can generate model analysis data by analyzing a plurality of models. The vehicle electronic device (1000) can generate model analysis data based on model response analysis data. The model response analysis data may be generated based on model metadata and the responses of the models. The model analysis data may be in which features corresponding to model analysis items are determined based on the metadata of the models and the responses of the models.
[0145] The vehicle electronic device (1000) can select a model to respond to a speech by mapping speech analysis information, context analysis information, and vehicle status information to model analysis data.
[0146] For example, the feature 'topic' included in the speech analysis information can be mapped to the 'topic-specific accuracy', 'topic-specific response speed', and 'topic-specific response amount' of the model analysis data. Additionally, for example, if the feature value of the feature 'topic' included in the speech analysis information is entertainment, the feature value entertainment can be mapped to the features mapped to the topic within the model analysis data. Accordingly, the vehicle electronic device (1000) can select a model suitable for handling the entertainment topic within the model analysis data.
[0147] For example, 'sentence length,' a feature included in speech analysis information, can be mapped to 'accuracy by sentence length,' 'response speed by sentence length,' and 'response volume by sentence length' in the model analysis data.
[0148] In a similar manner, there may be analysis items of model analysis data that map to features of speech analysis information, features of context analysis information, and features of vehicle status information.
[0149] Model analysis data can be generated for each of the multiple models. For example, analysis data for Model A can be generated by analyzing the responses of Model A, and analysis data for Model B can be generated by analyzing the responses of Model B.
[0150] In one embodiment, when selecting a model to respond to, the vehicle electronic device (1000) obtains a hyperparameter setting value mapped within the model analysis data based on at least one of speech analysis information, context analysis information, or vehicle state information, and can dynamically set a hyperparameter that adjusts the output of the selected model.
[0151] FIG. 12 is a diagram illustrating the operation of a vehicle electronic device according to one embodiment of the present disclosure connecting a situational response model.
[0152] In one embodiment, the vehicle electronic device (1000) can acquire speech analysis information, context analysis information, and vehicle status information. The vehicle electronic device (1000) can generate model analysis data regarding the plurality of models by analyzing the responses of the plurality of models. The vehicle electronic device (1000) can determine the model to respond to the current speech by mapping features based on the model analysis data.
[0153] Features of speech analysis information, context analysis information, and vehicle status information can be mapped to features of model analysis data. For example, a model to respond to a speech can be determined based on features included in speech analysis information, context analysis information, and vehicle status information, such as speech length, urgency, language, topic, driving mode, user profile, traffic conditions, and occupant information. Since it is impractical to list every analysis item included in the model analysis data, exemplary situations regarding some dynamic model connections are described below.
[0154] 1) Dynamic model based on user utterance length: Shorter sentences allow for accurate and fast responses using a smaller model, while longer sentences select a larger model to provide richer responses.
[0155] - When the utterance is short
[0156] User: How much further to the next rest area?
[0157] AI: There is a rest area about 30km away from your current location.
[0158] - When the utterance is long
[0159] User: Can you recommend a place to go with my family this weekend?
[0160] AI: I recommend a nearby park as a great place to visit with your family this weekend. You can enjoy a picnic there, and there is ample space for children to run around and play. Also, please note that there are many great restaurants near the park.
[0161] 2) Dynamic model based on user utterance urgency: Selection of a model capable of fast response when utterance urgency (mood) and sentence urgency (content) are high
[0162] - Cases of high urgency
[0163] User: Stop abruptly!
[0164] AI: Yes, I understand.
[0165] - Cases with low urgency
[0166] User: Where is the next rest area?
[0167] AI: You will reach your first rest area after traveling approximately 30km from your current location. This rest area is equipped with various restaurants and amenities, making it a good place to take a short break. If you plan to travel a longer distance, please note that the next rest area is located about 70km away. Have a safe journey.
[0168] 3) Dynamic model based on language: Selects a model well-trained in the corresponding language based on the language used by the user.
[0169] - If the user used Korean
[0170] User: How much further to the next rest area?
[0171] AI: There is a rest area about 30km away from your current location.
[0172] - If the user used English
[0173] User: How far is the next rest stop?
[0174] AI: The next rest stop is approximately 30 kilometers away from your current location.
[0175] 4) Dynamic model based on user utterance topic: Selects a model with expertise based on the topic of the sentence
[0176] - If the user's question is about cars
[0177] User: When is the time to change the car engine oil?
[0178] AI: Generally, it is recommended to replace car engine oil every 5,000 to 10,000 kilometers. However, this can vary depending on driving habits and the type of car, so it is advisable to refer to the manufacturer's recommendations or consult a professional.
[0179] - If the user's question is about cooking
[0180] User: Could you tell me how to make tomato pasta sauce?
[0181] AI: Of course. Here is how to make tomato pasta sauce...
[0182] 5) Dynamic Models by Driving Mode: Depending on the vehicle's driving mode (Sport / Eco), select a smaller model prioritizing efficiency in Eco mode, and select a model prioritizing quick responsiveness in Sport mode.
[0183] - In the case of Eco mode
[0184] User: What is the current weather like?
[0185] AI: The weather at your current location is rainy, and the temperature is 23 degrees.
[0186] - In the case of Sports Mode
[0187] User: What is the current weather like?
[0188] AI: It is raining, and the temperature is 23 degrees.
[0189] 6) Dynamic model based on user profile: Model selection based on the accuracy of existing utterances and model responses, depending on individual preferences and model fit.
[0190] - When the user prefers a concise answer
[0191] User A: Where is the nearest rest area right now?
[0192] AI: The nearest rest area from your current location is Deokpyeong Rest Area.
[0193] - When the user prefers detailed explanations
[0194] User B: Where is the nearest rest area right now?
[0195] AI: The nearest rest area to your current location is Deokpyeong Rest Area. It offers a variety of snacks and amenities. In particular, the food court is popular for its menu featuring local specialties from the Icheon region.
[0196] 7) Dynamic model based on traffic conditions: Depending on road traffic conditions, a short and fast model is selected when traffic is smooth and the average speed is high, and a model providing various information on appropriate speeds is selected when the average speed is low.
[0197] - When traffic conditions are smooth
[0198] User: Where is the nearest gas station?
[0199] AI: The nearest gas station is 'ABC Gas Station', located 5 km away.
[0200] - When traffic is congested
[0201] User: Where is the nearest gas station?
[0202] AI: The nearest gas station is 'ABC Gas Station', located 5km away. The price of gasoline at this gas station is 1,700 won per liter, and diesel is 1,500 won per liter. Additionally, the car wash service provided at this gas station costs 5,000 won.
[0203] 8) Dynamic model based on passenger information: Model selection based on relationships between passengers; for example, selecting a model that provides a friendly tone of response if a child is included among the passengers.
[0204] Meanwhile, referring to FIG. 12, when a model is selected based on speech length, urgency, language, topic, driving mode, user profile, traffic conditions, passenger information, etc., the results of selecting different models are shown. However, this is an exemplary case, and the model determination is dynamically determined based on speech analysis information, context analysis information, vehicle status information, and features of model analysis data.
[0205] FIG. 13 is a drawing for explaining how an electronic device for a vehicle according to one embodiment of the present disclosure operates in conjunction with a server.
[0206] In one embodiment, the vehicle electronic device (1000) can exchange data with a server (2000). For example, the vehicle electronic device (1000) can receive a speech input from a user and transmit the speech to the server (2000). The server (2000) can process the speech and return a response.
[0207] In one embodiment, the vehicle electronic device (1000) may transmit at least one of speech analysis information, context analysis information, or vehicle status information to the server (2000). In one embodiment, the vehicle electronic device (1000) may generate a prompt to be input into a generative model based on the speech analysis information, context analysis information, or vehicle status information. Specifically, the vehicle electronic device (1000) may select at least some of the features included in the speech analysis information (first features), the features included in the context analysis information (second features), and the features included in the vehicle status information (third features). The vehicle electronic device (1000) may generate a prompt including at least some of the features included in the speech analysis information, context analysis information, or vehicle status information, along with the original speech of the user, and transmit it to the server (2000). The server (2000) may generate and return a response based on the input from the vehicle electronic device (1000).
[0208] In one embodiment, the vehicle electronic device (1000) can obtain model metadata from the server (2000). And, the vehicle electronic device (1000) can generate response analysis data and model analysis data based on the responses of the models obtained from the server (2000).
[0209] In one embodiment, there may be one or more servers (2000). For example, Model A and Model B, which are operated based on cloud computing, may each be managed by different operating entities. In this case, the vehicle electronic device (1000) may interact with one or more servers (2000) corresponding to one or more models.
[0210] FIG. 14 is a block diagram illustrating the configuration of an electronic device for a vehicle according to one embodiment of the present disclosure.
[0211] Referring to FIG. 14, a vehicle electronic device (1000) according to one embodiment may include a communication interface (1100), a memory (1200), and at least one processor (1300).
[0212] The communication interface (1100) can perform data communication with other electronic devices under the control of the processor (1300).
[0213] The communication interface (1100) can perform data communication between a vehicle electronic device (1000) and another device by using at least one of data communication methods including, for example, wired LAN (e.g., Ethernet), wireless LAN (e.g., Wi-Fi), cellular network (e.g., 4G, 5G, etc.), Bluetooth, BLE (Bluetooth Low Energy), ZigBee, infrared communication (IrDA, infrared Data Association), NFC (Near Field Communication), RF communication, V2X (Vehicle to Everything), and various other types of known wireless / wired communication technologies. The communication interface (1100) may include a communication circuit designed to use the aforementioned communication methods.
[0214] A vehicle electronic device (1000) can transmit and receive data to another electronic device to dynamically determine a response model using a communication interface (1100). For example, the vehicle electronic device (1000) can transmit input data of the model (e.g., speech) and / or additional data (e.g., speech analysis information, context analysis information, vehicle status information) to another device, and can receive a response output from the model from another electronic device.
[0215] The memory (1200) may include various types of memory. The memory (1200) may include a main memory that stores data currently being processed in the vehicle electronic device (1000). For example, the main memory may include volatile memory such as RAM (Random Access Memory) or SRAM (Static Random Access Memory), but is not limited thereto. The memory (1200) may include a secondary memory that permanently stores large amounts of data (e.g., programs, system files, etc.). For example, the secondary memory may include non-volatile memory including at least one of a hard disk drive (HDD), a solid-state drive (SSD), an optical drive (e.g., CD), a flash drive, ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), and PROM (Programmable Read-Only Memory), but is not limited thereto.
[0216] The processor (1300) can control the overall operations of the vehicle electronic device (1000). For example, the processor (1300) can control the overall operations of the vehicle electronic device (1000) to dynamically select a response model and provide a response by executing one or more instructions of a program stored in memory (1200). There may be one or more processors (1300).
[0217] The processor (1300) may include a processing circuit. The processing circuit may include an operation unit that performs arithmetic and logical operations, a control unit that interprets instructions and controls execution, a register that stores data, and a cache memory, but is not limited thereto.
[0218] The processor (1300) may be composed of at least one of, for example, a Central Processing Unit (CPU), a Microprocessor, a Graphic Processing Unit (GPU), ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gate Arrays), an Application Processor (AP), a Neural Processing Unit (NPU), or an AI-dedicated processor designed with a hardware structure specialized for processing AI models, but is not limited thereto.
[0219] In one embodiment, there may be one or more processors (1300). If there is one or more processors (1300), the operations of the present disclosure may be performed by one or more processors individually or collectively by executing instructions and / or programs stored in memory (1200). If the method according to one embodiment of the present disclosure includes a plurality of operations, the plurality of operations may be performed by one processor (1300) or by a plurality of processors (1300).
[0220] For example, when the first, second, and third operations are performed by a method according to one embodiment, the first, second, and third operations may all be performed by a first processor, or some of the first to third operations may be performed by a first processor (e.g., a general-purpose processor) and the remaining operations may be performed by a second processor (e.g., an AI-dedicated processor). Here, operations for training / inference of an AI model may be performed by an AI-dedicated processor, which is an example of a second processor. However, the embodiments of the present disclosure are not limited thereto.
[0221] One or more processors according to the present disclosure may be implemented as a single-core processor or as a multi-core processor. When a method according to one embodiment of the present disclosure includes a plurality of operations, the plurality of operations may be performed by a single core or by a plurality of cores included in one or more processors.
[0222] FIG. 15 is a block diagram illustrating the configuration of an electronic device for a vehicle according to one embodiment of the present disclosure.
[0223] In one embodiment, the vehicle electronic device (1000) may include a communication interface (1100), a memory (1200), and a processor (1300). The communication interface (1100), memory (1200), and processor (1300) of FIG. 15 may correspond, respectively, to the basic functions described in the communication interface (1100), memory (1200), and processor (1300) of FIG. 14. Accordingly, for brevity, repetitive descriptions are omitted.
[0224] Sensors (1400) can sense data related to the vehicle. Sensors (1400) may include, for example, a speed sensor (e.g., Vehicle Speed Sensor; VSS), a location sensor (e.g., GPS), a communication-related sensor (e.g., a Bluetooth module, a cellular module (LTE / 5G / 6G), V2X (Vehicle to Everything)), a temperature sensor, a humidity sensor, an illuminance sensor, etc., but are not limited thereto. Sensor data obtained from the sensors (1400) can be incorporated as features of vehicle state information.
[0225] The camera (1500) may be composed of one or more cameras capable of capturing images of the interior of the vehicle. The camera (1500) may include one or more cameras installed inside the vehicle for a Driver Monitoring System (DMS) or Occupant Monitoring System (OMS).
[0226] The input / output interface (1600) can enable connection and communication with an input device (e.g., a microphone) or an output device (e.g., a display, a speaker). For example, the input / output interface may provide connection with a microphone, display device, or speaker by including HDMI, DisplayPort, USB-C, a 3.5mm audio jack, and HDMI ARC (Audio Return Channel). Meanwhile, the automotive electronic device (1000) may directly include a microphone, a display, and a speaker.
[0227] The present disclosure relates to a method for dynamically determining a response model and providing a response based on analysis information related to a user's utterance and a vehicle, and to an electronic device for a vehicle.
[0228] According to one aspect of the present disclosure, a method of operating an electronic device for a vehicle may be provided.
[0229] The above method may include the step of obtaining first features corresponding to a plurality of items of speech analysis information based on the user's speech.
[0230] The above method may include the step of acquiring second features corresponding to a plurality of items of context analysis information representing the interior context of a vehicle, based on one or more images taken of the interior of a vehicle.
[0231] The above method may include the step of obtaining third features corresponding to a plurality of items of vehicle state information representing a vehicle state related to driving, based on sensor data from one or more sensors in the vehicle.
[0232] The above method may include the step of mapping at least one of the first features, the second features, or the third features to model analysis data and selecting a model to respond to the utterance among a plurality of models.
[0233] The above method may include the step of processing the utterance through the selected model to provide a response.
[0234] The above model analysis data may be in which features corresponding to multiple analysis items are determined based on the metadata of the multiple models and the responses of the multiple models to previous utterances.
[0235] The step of acquiring the first features may include acquiring speech analysis information including the first features using a speech analysis model that takes the user's speech as input.
[0236] The step of acquiring the second features may include acquiring context analysis information including the second features using an image or a context analysis model that takes an image and audio as input.
[0237] The step of selecting the above model may include the step of identifying available models among the plurality of models.
[0238] The step of selecting the above model may include the step of identifying whether model analysis data corresponding to the above available models exists.
[0239] The step of selecting the above model may include the step of dynamically determining a model to respond to the utterance based on the existence of the above available models and the above model analysis data corresponding to the above available models.
[0240] The above method may include the step of dynamically setting hyperparameters that adjust the output of the selected model based on at least one of the speech analysis information, the context analysis information, or the vehicle status information.
[0241] The above method may include the step of generating recent response analysis data based on responses output by inputting at least one of the speech analysis information, the context analysis information, or the vehicle status information into the plurality of models.
[0242] The above method may include a step of updating the model analysis data based on the recent response analysis data.
[0243] The step of generating the above recent response analysis data may be to start the recent response analysis based on the vehicle electronic device entering an idle state.
[0244] The step of generating the above recent response analysis data may include the step of obtaining a user input representing feedback on the responses output from the plurality of models.
[0245] The step of generating the recent response analysis data may include determining at least one of a first weight corresponding to the utterance analysis information, a second weight corresponding to the context analysis information, or a third weight corresponding to the vehicle status information, based on the user input.
[0246] The step of generating the recent response analysis data may include applying at least one of the first weight, the second weight, or the third weight to the recent response analysis data.
[0247] The above plurality of models may include at least one of a language model, an image model, or an audio model.
[0248] The step of providing the above response may be to provide at least one of an image or audio as the response according to the type of the selected model among the plurality of models.
[0249] The above method may involve independently performing utterance analysis, context analysis, and vehicle status identification for each utterance to select the optimal model to be connected to each utterance.
[0250] The above method may include a step of determining whether to perform the utterance analysis, the context analysis, and the vehicle status identification based on whether the word or sentence defined in each utterance is included.
[0251] According to one aspect of the present disclosure, an electronic device for a vehicle may be provided. The electronic device for a vehicle may include at least one processor comprising a processing circuit; and a memory for storing instructions.
[0252] By executing the above instructions by the at least one processor, the vehicle electronic device can acquire first features corresponding to a plurality of items of speech analysis information based on the user's speech.
[0253] By executing the above instructions by the at least one processor, the vehicle electronic device can acquire second features corresponding to a plurality of items of context analysis information representing the vehicle interior context, based on one or more images taken of the vehicle interior.
[0254] By executing the above instructions by the at least one processor, the vehicle electronic device can acquire third features corresponding to a plurality of items of vehicle state information representing a vehicle state related to driving, based on sensor data from one or more sensors in the vehicle.
[0255] By executing the above instructions by the at least one processor, the vehicle electronic device can map at least one of the first features, the second features, or the third features to model analysis data and select a model to respond to the utterance among a plurality of models.
[0256] By executing the above instructions by the at least one processor, the vehicle electronic device can process the utterance through the selected model and provide a response.
[0257] The above model analysis data may be in which features corresponding to multiple analysis items are determined based on the metadata of the multiple models and the responses of the multiple models to previous utterances.
[0258] By executing the above instructions by the at least one processor, the vehicle electronic device can obtain speech analysis information including the first features using a speech analysis model that takes the user's speech as input.
[0259] By executing the above instructions by the at least one processor, the vehicle electronic device can obtain context analysis information including the second features using an image or a context analysis model that takes an image and audio as input.
[0260] By executing the above instructions by the at least one processor, the vehicle electronic device can identify available models among the plurality of models.
[0261] By executing the above instructions by the at least one processor, the vehicle electronic device can identify whether model analysis data corresponding to the available models exists.
[0262] By executing the above instructions by the at least one processor, the vehicle electronic device can dynamically determine a model to respond to the utterance based on the existence of the available models and the model analysis data corresponding to the available models.
[0263] By executing the above instructions by the at least one processor, the vehicle electronic device can dynamically set a hyperparameter that adjusts the output of the selected model based on at least one of the speech analysis information, the context analysis information, or the vehicle state information.
[0264] By executing the above instructions by the at least one processor, the vehicle electronic device can generate recent response analysis data based on the responses output by inputting at least one of the speech analysis information, the context analysis information, or the vehicle state information into the plurality of models.
[0265] By executing the above instructions by the at least one processor, the vehicle electronic device can update the model analysis data based on the recent response analysis data.
[0266] By executing the above instructions by the at least one processor, the vehicle electronic device can start a recent response analysis based on the vehicle electronic device entering an idle state.
[0267] By executing the above instructions by the at least one processor, the vehicle electronic device can obtain user input representing feedback on responses output from the plurality of models.
[0268] By executing the above instructions by the at least one processor, the vehicle electronic device can determine at least one of a first weight corresponding to the speech analysis information, a second weight corresponding to the context analysis information, or a third weight corresponding to the vehicle state information, based on the user input.
[0269] By executing the above instructions by the at least one processor, the vehicle electronic device can apply at least one of the first weight, the second weight, or the third weight to the recent response analysis data.
[0270] The above plurality of models may include at least one of a language model, an image model, or an audio model.
[0271] By executing the above instructions by the at least one processor, the vehicle electronic device can provide at least one of an image or audio as a response according to the type of the selected model among the plurality of models.
[0272] By executing the above instructions by the at least one processor, the vehicle electronic device can independently perform utterance analysis, context analysis, and vehicle state identification for each utterance, and select the optimal model to be connected to each utterance.
[0273] Meanwhile, embodiments of the present disclosure may also be implemented in the form of a recording medium comprising computer-executable instructions, such as program modules executed by a computer. A computer-readable medium may be any available medium accessible by a computer and includes both volatile and non-volatile media, and both removable and non-removable media. Additionally, a computer-readable medium may include a computer storage medium. A computer storage medium includes both volatile and non-volatile, removable and non-removable media implemented by any method or technique for storing information, such as computer-readable instructions, data structures, program modules, or other data.
[0274] Additionally, computer-readable storage media may be provided in the form of non-transitory storage media. Here, 'non-transitory storage media' simply means that it is a tangible device and does not contain a signal (e.g., electromagnetic waves), and the term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily. For example, 'non-transitory storage media' may include a buffer where data is stored temporarily.
[0275] According to one embodiment, the method according to the various embodiments disclosed herein may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product (e.g., a downloadable app) may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.
[0276] The foregoing description of the present disclosure is for illustrative purposes only, and those skilled in the art will understand that other specific forms can be easily modified without altering the technical spirit or essential features of the present disclosure. Therefore, the embodiments described above should be understood as illustrative in all respects and not restrictive. For example, each component described as a single unit may be implemented in a distributed manner, and components described as distributed may likewise be implemented in a combined form.
[0277] The scope of the present disclosure is defined by the claims set forth below rather than by the detailed description above, and all modifications or variations derived from the meaning and scope of the claims and equivalent concepts thereof should be interpreted as being included within the scope of the present disclosure.
Claims
1. In a method of operating an electronic device for a vehicle, A step (S210) of obtaining first features corresponding to multiple items of speech analysis information based on the user's speech; A step (S220) of acquiring second features corresponding to multiple items of context analysis information representing the vehicle interior context based on one or more images taken of the vehicle interior; A step (S230) of obtaining third features corresponding to a plurality of items of vehicle state information representing a vehicle state related to driving, based on sensor data from one or more sensors in the vehicle; A step (S240) of mapping at least one of the first features, the second features, or the third features to model analysis data to select a model to respond to the utterance among a plurality of models; and It includes the step (S250) of processing the utterance through the selected model to provide a response, and A method in which the above model analysis data is a method in which features corresponding to a plurality of analysis items are determined based on the metadata of the plurality of models and the responses of the plurality of models to previous utterances.
2. In Paragraph 1, The step of acquiring the above first features is, The method includes the step of obtaining speech analysis information including the first features using a speech analysis model that takes the utterance of the user as input, and The step of acquiring the above second features is, A method comprising the step of obtaining context analysis information including the second features using a context analysis model that takes an image or an image and audio as input.
3. In any one of paragraphs 1 to 2, The step of selecting the above model is, A step of identifying available models among the plurality of models above; A step of identifying whether model analysis data corresponding to the above available models exists; and A method comprising the step of dynamically determining a model to respond to the utterance based on the existence of the above available models and the above model analysis data corresponding to the above available models.
4. In Paragraph 3, The above method is, A method further comprising the step of dynamically setting hyperparameters that adjust the output of the selected model based on at least one of the utterance analysis information, the context analysis information, or the vehicle status information.
5. In Paragraph 1, The above method is, A step of generating recent response analysis data based on responses output by inputting at least one of the above-mentioned speech analysis information, the above-mentioned context analysis information, or the above-mentioned vehicle status information into the plurality of models; and A method further comprising the step of updating the model analysis data based on the recent response analysis data.
6. In Paragraph 5, The step of generating the above recent response analysis data is, A method of initiating a recent response analysis based on the vehicle electronic device entering an idle state.
7. In Paragraph 5, The step of generating the above recent response analysis data is, A step of obtaining user input representing feedback on responses output from the plurality of models; Based on the above user input, a step of determining at least one of a first weight corresponding to the utterance analysis information, a second weight corresponding to the context analysis information, or a third weight corresponding to the vehicle status information; and A method comprising the step of applying at least one of the first weight, the second weight, or the third weight to the recent response analysis data.
8. In a vehicle electronic device (1000), At least one processor (1300) including a processing circuit; It includes a memory (1200) for storing instructions, By executing the above instructions by the at least one processor (1300), the vehicle electronic device (1000) is, Based on the user's utterance, first features corresponding to multiple items of utterance analysis information are obtained, and Based on one or more images of the vehicle interior, second features corresponding to multiple items of context analysis information representing the vehicle interior context are obtained, and Based on sensor data from one or more sensors within the vehicle, third features corresponding to a plurality of items of vehicle state information representing a vehicle state related to driving are obtained, and At least one of the first features, the second features, or the third features is mapped to model analysis data to select a model to respond to the utterance among a plurality of models, and Process the utterance through the selected model above to provide a response, The above model analysis data is a vehicle electronic device (1000) in which features corresponding to multiple analysis items are determined based on the metadata of the multiple models and the responses of the multiple models to previous utterances.
9. In Paragraph 8, By executing the above instructions by the at least one processor, the vehicle electronic device, Acquiring speech analysis information including the first features using a speech analysis model that takes the above user's speech as input, and An electronic device for a vehicle that obtains context analysis information including the second features using a context analysis model that takes an image or an image and audio as input.
10. In any one of paragraphs 8 through 9, By executing the above instructions by the at least one processor, the vehicle electronic device, Identify available models among the above plurality of models, and Identify whether model analysis data corresponding to the above available models exists, and An electronic device for a vehicle that dynamically determines a model to respond to the utterance based on the existence of the above available models and the above model analysis data corresponding to the above available models.
11. In Paragraph 10, By executing the above instructions by the at least one processor, the vehicle electronic device, An electronic device for a vehicle that dynamically sets hyperparameters for adjusting the output of the selected model based on at least one of the utterance analysis information, the context analysis information, or the vehicle state information.
12. In Paragraph 8, By executing the above instructions by the at least one processor, the vehicle electronic device, Generating recent response analysis data based on responses output by inputting at least one of the above-mentioned speech analysis information, the above-mentioned context analysis information, or the above-mentioned vehicle status information into the plurality of models, and An electronic device for a vehicle that updates the model analysis data based on the above recent response analysis data.
13. In Paragraph 12, By executing the above instructions by the at least one processor, the vehicle electronic device, A vehicle electronic device that starts a recent response analysis based on the vehicle electronic device entering an idle state.
14. In Paragraph 12, By executing the above instructions by the at least one processor, the vehicle electronic device, Obtaining user input representing feedback on responses output from the above plurality of models, and Based on the above user input, at least one of a first weight corresponding to the utterance analysis information, a second weight corresponding to the context analysis information, or a third weight corresponding to the vehicle status information is determined, and An electronic device for a vehicle that applies at least one of the first weight, second weight, or third weight to the recent response analysis data.
15. A computer-readable recording medium having a program for executing the method of any one of paragraphs 1 through 7 on a computer.