Interaction method and apparatus, and vehicle
By acquiring and processing user intent within the vehicle cabin, and sending de-emotionalized polite language to another vehicle, communication misunderstandings between drivers are resolved, effective information transmission between vehicles is achieved, the driving experience is improved, and traffic accidents are reduced.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- YINWANG INTELLIGENT TECHNOLOGIES CO LTD
- Filing Date
- 2025-06-27
- Publication Date
- 2026-06-18
Smart Images

Figure CN2025104214_18062026_PF_FP_ABST
Abstract
Description
Interaction method, device and vehicle
[0001] The present application claims priority to the Chinese Patent Application No. 202410869842.4, filed on June 28, 2024, entitled "Interaction method, device and vehicle", and the Chinese Patent Application No. 202411993129.7, filed on December 30, 2024, entitled "Interaction method, device and vehicle", the contents of which are incorporated herein by reference in their entirety. TECHNICAL FIELD
[0002] The present application relates to the field of intelligent cockpit, and more particularly, to an interaction method, device and vehicle. BACKGROUND
[0003] When a vehicle is driving on a road, it will have an interaction relationship with a nearby driving vehicle. For example, a series of interactions occur, such as overtaking, following, lane changing, cutting in, sudden braking, or queuing, etc. When these interactions occur between vehicles, the drivers cannot perceive each other's emotions, which is easy to cause misunderstanding, and further leads to the driver's angry emotion. In a more serious case, it can lead to the driver's retaliatory behavior (e.g., cutting in, stopping, etc.), resulting in a traffic accident. SUMMARY
[0004] The present application provides an interaction method, device and vehicle, which can avoid the driver's angry emotion and retaliatory behavior when the interaction occurs between vehicles, and further avoid the occurrence of traffic accidents, which helps to improve the user's driving experience.
[0005] In a first aspect, an interaction method is provided, the method comprising: obtaining a first intention of a user in a vehicle cabin, the first intention being associated with another vehicle and the first intention being associated with a negative emotion of the user; performing de-emotional processing on the first intention to obtain a second intention; and sending the second intention to the other vehicle.
[0006] In combination with the first aspect, in some implementations of the first aspect, the performing de-emotional processing on the first intention to obtain a second intention comprises: performing de-emotional processing on the first intention to obtain a de-emotional processed intention; and adding polite language in the de-emotional processed intention to obtain the second intention.
[0007] In some implementations of the first aspect, before the first intention of the user in the vehicle cabin is acquired, the method further includes: acquiring a first voice instruction of the user; and determining the first intention according to the first voice instruction; and before the second intention is sent to the other vehicle, the method further includes: when the first voice instruction does not include slot information corresponding to the first intention, determining information of the other vehicle according to the first intention and data collected by a sensor outside the vehicle cabin; or when the first voice instruction includes the slot information corresponding to the first intention, determining the information of the other vehicle according to the first slot information and the data collected by the sensor.
[0008] In some implementations of the first aspect, before the first intention of the user in the vehicle cabin is acquired, the method further includes: acquiring a second voice instruction of the user, the second voice instruction including the first intention; and the de-emotional processing of the first intention to obtain the second intention includes: inputting the second voice instruction, information of the vehicle, and surrounding environment information of the vehicle into an inference model to obtain the second intention and information of the other vehicle.
[0009] In some implementations of the first aspect, the first intention of the user in the vehicle cabin is acquired by: acquiring driving behavior of the user and driving records of the other vehicle; and determining the first intention according to the driving behavior of the user and the driving records of the other vehicle; and before the second intention is sent to the other vehicle, the method further includes: determining information of the other vehicle according to the first intention and data collected by a sensor outside the vehicle cabin.
[0010] In some implementations of the first aspect, before the second intention is sent to the other vehicle, the method further includes: determining that the second intention satisfies a road regulation.
[0011] In some implementations of the first aspect, the second intention is sent to the other vehicle by: sending the second intention to the other vehicle according to signal strength and / or signal quality of an environment in which the vehicle is located.
[0012] In some implementations of the first aspect, the second intention is sent to the other vehicle according to signal strength and / or signal quality of an environment in which the vehicle is located by: when the signal strength is greater than or equal to a preset signal strength, and / or the signal quality is greater than or equal to a preset signal quality, sending information of the other vehicle and the second intention to a cloud server, so that the cloud server sends the second intention to the other vehicle according to the information of the other vehicle.
[0013] With reference to the first aspect, in some implementations of the first aspect, the sending the second intention to the other vehicle comprises: sending the second intention to the other vehicle through near field communication when the signal strength is less than a preset signal strength and / or the signal quality is less than a preset signal quality.
[0014] The second aspect provides an interaction method, comprising: receiving a second intention from a vehicle, the second intention being an intention obtained by de-emotionalizing a first intention of a user in a cabin of the vehicle; and controlling a prompt device to prompt the user with the second intention.
[0015] With reference to the second aspect, in some implementations of the second aspect, the controlling the prompt device to prompt the user with the second intention comprises: determining a first driving opinion according to the second intention; and controlling the prompt device to prompt the user with the second intention and the first driving opinion.
[0016] With reference to the second aspect, in some implementations of the second aspect, the controlling the prompt device to prompt the user with the second intention comprises: controlling the prompt device to prompt the user with the second intention according to a state of the user in a main driving area.
[0017] With reference to the second aspect, in some implementations of the second aspect, the second intention indicates an abnormal driving behavior, and the method further comprises: controlling a vehicle light and / or a vehicle exterior projection information to display first information, the first information being used to apologize and / or thank the user in the vehicle.
[0018] The third aspect provides an interaction method, comprising: obtaining a first input of a user in a cabin of a vehicle, information of the vehicle, and environmental information around the vehicle; determining a first output according to the first input, the information of the vehicle, and the environmental information; and sending the first output to another vehicle.
[0019] With reference to the third aspect, in some implementations of the third aspect, the determining the first output according to the first input, the information of the vehicle, and the environmental information comprises: inputting the first input, the information of the vehicle, and the environmental information into an inference model to obtain the first output.
[0020] With reference to the third aspect, in some implementations of the third aspect, the information of the vehicle comprises at least one of a speed of the vehicle, a type of a road where the vehicle is located, and a type of a lane where the vehicle is located.
[0021] With reference to the third aspect, in some implementations of the third aspect, the environmental information comprises data collected by a sensor of the vehicle.
[0022] In a fourth aspect, an interaction apparatus is provided, which comprises units or modules for performing the method of any possible implementation of the first aspect to the third aspect.
[0023] In a fifth aspect, an interaction apparatus is provided, which comprises a processing unit and a storage unit, wherein the storage unit is configured to store instructions, and the processing unit is configured to execute the instructions stored in the storage unit, so that the interaction apparatus performs any possible method of the first aspect to the third aspect.
[0024] In a sixth aspect, an interaction system is provided, which comprises a perception system and a computing platform, wherein the computing platform comprises any possible apparatus of the fourth aspect or the fifth aspect.
[0025] In a seventh aspect, a vehicle is provided, which comprises the apparatus of the fourth aspect or the fifth aspect, or the system of the sixth aspect.
[0026] In an eighth aspect, a computer program product is provided, which comprises computer program code, when the computer program code is run on a computer, so that the computer performs any possible method of the first aspect to the third aspect.
[0027] It should be noted that the above computer program code can be stored in the first storage medium in whole or in part, wherein the first storage medium can be packaged together with the processor, or packaged separately from the processor, and the embodiments of the present application do not make a specific limitation in this regard.
[0028] In a ninth aspect, a computer readable medium is provided, which stores program code, when the computer program code is run on a computer, so that the computer performs any possible method of the first aspect to the third aspect.
[0029] In a tenth aspect, a chip system is provided, which comprises a processor, configured to invoke computer programs or computer instructions stored in a memory, so that the processor performs any possible method of the first aspect to the third aspect.
[0030] In combination with the tenth aspect, in a possible implementation, the processor is coupled with the memory through an interface.
[0031] In combination with the tenth aspect, in a possible implementation, the chip system further comprises the memory, and the memory stores the computer programs or computer instructions.
[0032] In a eleventh aspect, the present application provides an interaction method, comprising: obtaining a first input of a user in a vehicle cabin and environment information of a surrounding of the vehicle, the environment information comprising information of another vehicle; determining a first output according to the first input and the environment information; and sending the first output to the another vehicle or a terminal device associated with the another vehicle.
[0033] Based on the above technical solution, the first output can be determined through the input of the user in the vehicle cabin and the environment information, so that the first output can be sent to the another vehicle or the terminal device. In this way, the information isolation between drivers of two vehicles can be broken, and the interaction between vehicles can be realized.
[0034] In some possible implementation manners, the first input can comprise a voice input of the user.
[0035] In some possible implementation manners, the first input can comprise an input of the user to one or more components in the vehicle. For example, the one or more components can comprise, but are not limited to, a steering wheel, a light, a horn, an accelerator pedal or a brake pedal.
[0036] In some possible implementation manners, the first input can comprise physiological feature information of the user. For example, the physiological feature information can comprise blood pressure, heart rate, facial expression, etc. of the user.
[0037] In combination with the eleventh aspect, in some implementation manners of the eleventh aspect, the first input is associated with a negative emotion of the user, and the first output comprises a de-emotional output.
[0038] Based on the above technical solution, when the input of the user is associated with a negative emotion of the user, the information that needs to be interacted with the another vehicle can be processed again based on the input of the user and the environment information, so that the de-emotional processing is realized. In this way, the purpose of effective and friendly communication between vehicles can be achieved, the purpose of transmitting effective information without transmitting negative emotions is achieved, and the probability of conflict between drivers is reduced.
[0039] In some possible manners, the first input comprises impolite language, and the first output further comprises polite language converted from the impolite language.
[0040] Based on the above technical solution, while the de-emotional processing is performed, the vehicle can also convert the impolite language of the user, so that the converted polite language is sent to the another vehicle. In this way, the purpose of effective and friendly communication between vehicles can be further achieved, the purpose of transmitting effective information without transmitting negative emotions is achieved, and the probability of conflict between drivers is reduced.
[0041] In some possible implementation manners, the method further includes: controlling the prompting device to output the third output, the third output including an output result for soothing the user in the cabin.
[0042] Based on the technical solution above, when the negative emotion of the user associated with the first input is determined, the output result for soothing the user in the cabin can be output based on the first input and the environmental information. In this way, while realizing friendly interaction between the vehicle and another vehicle, the user in the cabin can also be soothed, which helps to relieve the negative emotion of the user and improve the driving experience of the user.
[0043] For example, the third output can include soothing language.
[0044] For example, the third output can include an execution instruction for one or more components in the cabin.
[0045] With reference to the eleventh aspect, in some implementation manners of the eleventh aspect, determining the first output according to the first input and the environmental information includes: inputting the first input and the environmental information into a first inference model to obtain the first output.
[0046] Based on the technical solution above, the first input and the environmental information can be input into the first inference model, so that the first output can be obtained. In this way, end-to-end input and output can be realized through the first inference model.
[0047] For example, the first inference model can be a multi-modal model.
[0048] With reference to the eleventh aspect, in some implementation manners of the eleventh aspect, the method further includes: obtaining information of the vehicle; and wherein determining the first output according to the first input and the environmental information includes: determining the first output according to the first input, the information of the vehicle, and the environmental information.
[0049] Based on the technical solution above, through the input of the user in the cabin of the vehicle, the information of the vehicle, and the environmental information, the first output can be determined, so that the first output can be sent to another vehicle or a terminal device. In this way, information isolation between drivers in two vehicles can be broken, and interaction between vehicles can be realized. Meanwhile, by combining the information of the vehicle, the accuracy of the first output result can be further improved.
[0050] In some possible implementation manners, the information of the vehicle includes historical driving records of the vehicle.
[0051] In some possible implementation manners, the information of the vehicle includes one or more of a speed, an acceleration, and a position of the vehicle.
[0052] In some possible implementation manners of the eleventh aspect, the determining the first output according to the first input, the information of the vehicle, and the environmental information comprises: inputting the first input, the information of the vehicle, and the environmental information into a second inference model to obtain the first output.
[0053] According to the technical solution, the first input, the information of the vehicle, and the environmental information can be input into the first inference model, and thus the first output can be obtained. In this way, the end-to-end input and output can be implemented by using the second inference model.
[0054] In some possible implementation manners, the first inference model and the second inference model can be the same model.
[0055] In some possible implementation manners of the eleventh aspect, the method further comprises: determining information of another vehicle according to the first input and the environmental information.
[0056] According to the technical solution, the information of another vehicle can be determined according to the input of the user in the vehicle cabin and the environmental information, and thus the first output can be sent to the another vehicle or the terminal device. In this way, the information of the vehicle to be interacted can be determined in combination with the input of the user and the environmental information, and the accuracy of the determined vehicle to be interacted can be ensured.
[0057] In some possible implementation manners of the eleventh aspect, the determining the information of another vehicle according to the first input and the environmental information comprises: inputting the first input and the environmental information into a third inference model to obtain the information of another vehicle.
[0058] In some possible implementation manners, the first inference model, the second inference model, and the third inference model can be the same model.
[0059] In some possible implementation manners of the eleventh aspect, the information of another vehicle comprises information of a license plate of the another vehicle; and the sending the first output to the another vehicle or the terminal device comprises: sending the first output and the information of the license plate of the another vehicle to a cloud server, so that the cloud server sends the first output to the another vehicle or the terminal device based on the information of the license plate of the another vehicle.
[0060] According to the technical solution, the vehicle can send the first output and the information of the license plate of the another vehicle to the cloud server, so that the cloud server can send the first output to the another vehicle by using the information of the license plate of the another vehicle. By forwarding the information by using the cloud server, the information isolation between drivers of the two vehicles can be broken, and the interaction between the vehicles can be implemented.
[0061] In some possible implementations, the cloud server stores an association between a license plate of the vehicle and identification information of the terminal device (e.g., a mobile phone or the vehicle).
[0062] With reference to the eleventh aspect, in some implementations of the eleventh aspect, sending the first output to the other vehicle or the terminal device includes sending the first output to the other vehicle through a short-distance communication technology.
[0063] With reference to the eleventh aspect, in some implementations of the eleventh aspect, the method further includes determining a first control instruction according to the first input and the environmental information, the first control instruction being associated with one or more actuators, and controlling the one or more actuators to execute the first control instruction.
[0064] Based on the technical solutions described above, the control instruction can also be obtained according to the first input and the environmental information. In this way, by executing the control instruction on the one or more actuators, the user can be helped to perform a corresponding operation, such as editing a light language, expressing gratitude, expressing apologies, and the like.
[0065] The twelfth aspect provides an interaction method, which includes obtaining a first output of another vehicle and environmental information around the vehicle, determining a second output according to the first output and the environmental information, and controlling a prompt device to output the second output.
[0066] Based on the technical solutions described above, the vehicle can obtain the second output according to the output of the other vehicle and the environmental information around the vehicle, and control the prompt device to output the second output. In this way, the information isolation between drivers of the two vehicles can be broken, and the interaction between the vehicles can be implemented.
[0067] With reference to the twelfth aspect, in some implementations of the twelfth aspect, determining the second output according to the first output and the environmental information includes determining the second output according to data collected by a sensor in a cabin of the vehicle, the first output, and the environmental information.
[0068] Based on the technical solutions described above, the data collected by the sensor in the cabin of the vehicle can be combined when the second output is determined. In this way, the second output can be more easily accepted by a user in the current cabin, and the driving experience of the user can be improved.
[0069] With reference to the twelfth aspect, in some implementations of the twelfth aspect, determining the second output according to the first output and the environmental information includes determining the second output according to first information of a driver in the vehicle, the first output, and the environmental information, the first information including one or more of driving proficiency, driving habits, or physiological characteristic information.
[0070] Based on the above technical solution, driver information can be incorporated when determining the second output. This allows the second output to better match the driver's profile, resulting in different outputs for different driver profiles. This helps improve the vehicle's intelligence and enhances the user's driving experience.
[0071] In conjunction with the twelfth aspect, in some implementations of the twelfth aspect, a second output is determined based on the first output and environmental information, including: determining the second output based on the historical driving records of other vehicles around the vehicle, the first output, and environmental information.
[0072] Based on the above technical solution, the historical driving records of other vehicles around the vehicle can be considered when determining the second output. This makes the second output more accurate and more acceptable to users in the cabin.
[0073] In conjunction with the twelfth aspect, in some implementations of the twelfth aspect, the second output is determined based on the first output and environmental information, including: inputting the first output and environmental information into the inference model to obtain the second output.
[0074] In a thirteenth aspect, this application provides an interactive device, comprising: an acquisition unit for acquiring a first input from a user in a vehicle cabin and environmental information surrounding the vehicle, the environmental information including information about another vehicle; a determination unit for determining a first output based on the first input and the environmental information; and a sending unit for sending the first output to another vehicle or a terminal device, the terminal device being associated with the other vehicle.
[0075] In conjunction with aspect thirteen, in some implementations of aspect thirteen, the first input is associated with the user's negative emotions, and the first output includes de-emotionalized output.
[0076] In conjunction with aspect thirteen, in some implementations of aspect thirteen, a unit is defined for: inputting a first input and environmental information into a first inference model to obtain a first output.
[0077] In conjunction with aspect thirteen, in some implementations of aspect thirteen, the acquisition unit is further configured to acquire vehicle information; the determination unit is configured to: determine the first output based on the first input, the vehicle information, and the environmental information.
[0078] In conjunction with aspect thirteen, in some implementations of aspect thirteen, a unit is defined for: inputting the first input, vehicle information, and environmental information into the second inference model to obtain the first output.
[0079] In conjunction with aspect thirteen, in some implementations of aspect thirteen, the determining unit is also used to: determine information about another vehicle based on the first input and environmental information.
[0080] With reference to the thirteenth aspect, in some implementations of the thirteenth aspect, the determining unit is configured to: input the first input and the environment information into a third inference model to obtain the information of the other vehicle.
[0081] With reference to the thirteenth aspect, in some implementations of the thirteenth aspect, the information of the other vehicle includes information of a license plate of the other vehicle; and the sending unit is configured to: send the first output and the information of the license plate of the other vehicle to the cloud server, so that the cloud server sends the first output to the other vehicle or the terminal device based on the information of the license plate of the other vehicle.
[0082] With reference to the thirteenth aspect, in some implementations of the thirteenth aspect, the apparatus further includes a control unit, and the determining unit is further configured to determine a first control instruction according to the first input and the environment information, the first control instruction being associated with one or more actuators; and the control unit is configured to control the one or more actuators to execute the first control instruction.
[0083] The fourteenth aspect provides an interaction apparatus, which includes: an obtaining unit configured to obtain a first output of an other vehicle and environment information around the vehicle; a determining unit configured to determine a second output according to the first output and the environment information; and a control unit configured to control a prompting apparatus to output the second output.
[0084] With reference to the fourteenth aspect, in some implementations of the fourteenth aspect, the determining unit is configured to determine the second output according to data collected by a sensor in a cabin of the vehicle, the first output and the environment information.
[0085] With reference to the fourteenth aspect, in some implementations of the fourteenth aspect, the determining unit is configured to determine the second output according to first information of a driver in the vehicle, the first output and the environment information, the first information including one or more of driving proficiency, driving habit or physiological characteristic information.
[0086] With reference to the fourteenth aspect, in some implementations of the fourteenth aspect, the determining unit is configured to determine the second output according to historical driving records of other vehicles around the vehicle, the first output and the environment information.
[0087] With reference to the fourteenth aspect, in some implementations of the fourteenth aspect, the determining unit is configured to input the first output and the environment information into an inference model to obtain the second output.
[0088] The fifteenth aspect provides an interaction apparatus, which includes a processing unit and a storage unit, wherein the storage unit is configured to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the interaction apparatus executes any possible method in the eleventh aspect or the twelfth aspect.
[0089] In a sixteenth aspect, an interaction system is provided, the system comprising a perception system and a computing platform, wherein the computing platform comprises any possible apparatus of the thirteenth aspect to the fifteenth aspect.
[0090] In a seventeenth aspect, a vehicle is provided, the vehicle comprising any possible apparatus of the thirteenth aspect to the fifteenth aspect, or comprising the system of the sixteenth aspect.
[0091] In an eighteenth aspect, a computer program product is provided, the computer program product comprising: computer program code which, when run on a computer, causes the computer to perform any possible method of the eleventh aspect or the twelfth aspect.
[0092] It should be noted that the computer program code can be stored in whole or in part on a first storage medium, wherein the first storage medium can be packaged together with the processor or packaged separately from the processor, and the embodiments of the present application do not make a specific limitation in this regard.
[0093] In a nineteenth aspect, a computer readable medium is provided, the computer readable medium storing program code which, when run on a computer, causes the computer to perform any possible method of the above eleventh aspect or the twelfth aspect.
[0094] In a twentieth aspect, the embodiments of the present application provide a chip system, the chip system comprising a processor, configured to invoke a computer program or computer instructions stored in a memory, so as to cause the processor to perform any possible method of the above eleventh aspect or the twelfth aspect.
[0095] In combination with the twentieth aspect, in a possible implementation manner, the processor is coupled with the memory through an interface.
[0096] In combination with the twentieth aspect, in a possible implementation manner, the chip system further comprises the memory, and the memory stores the computer program or the computer instructions. BRIEF DESCRIPTION OF DRAWINGS
[0097] FIG. 1 is a functional block diagram of a vehicle according to an embodiment of the present application.
[0098] FIGS. 2A-2C are schematic diagrams of an interaction scenario according to an embodiment of the present application.
[0099] FIG. 3 is another schematic diagram of an interaction scenario according to an embodiment of the present application.
[0100] FIG. 4 is another schematic diagram of an interaction scenario according to an embodiment of the present application.
[0101] FIG. 5 is another schematic diagram of an interaction scenario according to an embodiment of the present application.
[0102] FIGS. 6A-6B are another schematic diagram of an interaction scenario according to an embodiment of the present application.
[0103] FIG. 7 is another schematic diagram of an interaction scenario according to an embodiment of the present application.
[0104] FIGS. 8A-8C are another schematic diagram of an interaction scenario according to an embodiment of the present application.
[0105] FIG. 9 is a schematic flowchart of an interaction method according to an embodiment of the present application.
[0106] FIG. 10 is another schematic flowchart of an interaction method according to an embodiment of the present application.
[0107] FIG. 11 is another schematic flowchart of an interaction method according to an embodiment of the present application.
[0108] FIG. 12 is a system architecture diagram according to an embodiment of the present application.
[0109] FIG. 13 is another system architecture diagram according to an embodiment of the present application.
[0110] FIG. 14 is a schematic flowchart of an interaction method according to an embodiment of the present application.
[0111] FIG. 15 is a system architecture diagram according to an embodiment of the present application.
[0112] FIG. 16 is a schematic block diagram of an interaction apparatus according to an embodiment of the present application.
[0113] FIG. 17 is a schematic block diagram of an interaction apparatus according to an embodiment of the present application. DETAILED DESCRIPTION
[0114] The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings. In the description of the embodiments of the present application, unless otherwise specified, " / " means or, for example, A / B can mean A or B; in this document, "and / or" merely describes an association relationship of associated objects, which means that there can be three relationships, for example, A and / or B, which means that there are three cases of A alone, A and B together, and B alone.
[0115] The prefix words such as "first", "second" are used in the embodiments of the present application only to distinguish different description objects, and have no limiting effect on the position, order, priority, quantity or content of the described objects. The use of prefix words such as ordinal numbers in the embodiments of the present application does not constitute a limitation on the described objects, and the description of the described objects should be referred to the description of the context in the claims or embodiments, and should not constitute redundant limitation because of the use of such prefix words. In addition, in the description of the embodiments, unless otherwise stated, the meaning of "plurality" is two or more than two.
[0116] FIG. 1 is a functional block diagram of a vehicle 100 according to an embodiment of the present application. The vehicle 100 can include a perception system 110, a computing platform 120, a display device 130, and a sound emitting device 140, wherein the perception system 110 can include one or more sensors that sense information about the environment around the vehicle 100. For example, the perception system 110 can include a positioning system, which can be a global positioning system (GPS), a Beidou system, or other positioning systems. The perception system 110 can also include one or more of an inertial measurement unit (IMU), a laser radar, a millimeter wave radar, an ultrasonic radar, and a camera.
[0117] Some or all functions of the vehicle 100 can be controlled by the computing platform 120. The computing platform 120 can include one or more processors, such as processors 121 through 12n (n is a positive integer), which are circuits having a processing capability of signals. In one implementation, the processors can be circuits having an instruction reading and running capability, such as a central processing unit (CPU), a microprocessor, a graphics processing unit (GPU) (which can be understood as a kind of microprocessor), a digital signal processor (DSP), or the like. In another implementation, the processors can be circuits having a certain function implemented by a logic relationship of hardware circuits, which is fixed or reconfigurable. For example, the processors can be hardware circuits implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD) such as a field programmable gate array (FPGA). In the reconfigurable hardware circuit, the processor loads a configuration document to implement the hardware circuit configuration. It can be understood that the processor loads an instruction to implement the functions of the above part or all units. In addition, the processor can also be a hardware circuit designed for artificial intelligence, which can be understood as a kind of ASIC, such as a neural network processing unit (NPU), a tensor processing unit (TPU), a deep learning processing unit (DPU), or the like. In addition, the computing platform 120 can further include a memory for storing instructions, and some or all of the processors 121 through 12n can call and execute the instructions in the memory to implement corresponding functions.
[0118] The in-cabin display devices 130 are mainly divided into two categories: the first is the in-vehicle display screen; the second is the projection display screen, such as the head-up display (HUD). An in-vehicle display screen is a physical display screen and an important component of the in-vehicle infotainment system. Multiple displays can be installed in the cabin, such as the digital instrument cluster display, the central control screen, the display screen in front of the front passenger (also known as the passenger in the forward area), the display screen in front of the left rear passenger, the display screen in front of the right rear passenger, and even the car window can be used as a display screen. A head-up display, also known as a head-up display system, is mainly used to display driving information such as speed and navigation on a display device in front of the driver (such as the windshield). This reduces the driver's eye-shifting time, avoids pupil changes caused by eye-shifting, and improves driving safety and comfort. Examples of HUDs include combiner-HUD (C-HUD) systems, windshield-HUD (W-HUD) systems, and augmented reality HUD (AR-HUD) systems. It should be understood that HUDs can also evolve into other types of systems as technology progresses, and this application does not limit them.
[0119] The sound-generating device 140 can be a loudspeaker, audio system, or horn, etc.
[0120] As mentioned earlier, vehicles interact with other vehicles on the road. These interactions include overtaking, following, changing lanes, cutting in, sudden braking, and queuing. During these interactions, drivers cannot perceive each other's emotions, which can easily lead to misunderstandings and anger. In more serious cases, this can result in retaliatory behavior (e.g., cutting off other vehicles or slamming on the brakes), causing traffic accidents.
[0121] This application provides an interaction method, device, and vehicle that locates vehicles based on information described in natural language, achieving the effect of establishing information connections. Simultaneously, it allows for secondary processing of user interaction information to achieve effective and friendly communication, conveying useful information without transmitting negative emotions. This helps prevent drivers from experiencing anger or retaliatory behavior during vehicle interactions, thereby avoiding traffic accidents and improving the user's driving experience.
[0122] For example, Figures 2A-2C show schematic diagrams of the interactive scenarios provided in the embodiments of this application.
[0123] As shown in Figure 2A, when driver A, driving vehicle 100, notices vehicle 200 changing lanes from its current lane to the lane where vehicle 100 is located, driver A will slow down vehicle 100. This may cause driver A to feel dissatisfied with driver B inside vehicle 200. At this time, driver A may honk the horn of vehicle 100 to express their dissatisfaction.
[0124] If driver B in vehicle 200 hears vehicle 100 honking, it's clear that driver A in vehicle 100 is unhappy. At this point, driver B can send a voice message: "Please tell the driver behind me, I'm sorry, there's an emergency at home."
[0125] After receiving voice information from the user, vehicle 200 can first convert the voice information into text content through an automatic speech recognition (ASR) module. Vehicle 200 can then understand the user's true intention and current emotion based on the text content.
[0126] For example, a natural language understanding (NLU) engine can be used to obtain the user's intention and slot. For instance, the user's intention might be "I apologize because of a family emergency," the slot might be "the driver behind," and the user's current emotion might be "apologizing." Vehicle 200 can determine the license plate information of vehicle 100 using images captured by an external camera. Vehicle 200 can then send the user's intention and vehicle 100's license plate information to a cloud server. The cloud server can then locate vehicle 100's address based on its license plate information and, according to that address, send the user's intention to vehicle 100. Simultaneously, in response to receiving the user's intention, vehicle 200 can issue a voice message to the user, "Okay, I'll tell him now."
[0127] Optionally, vehicle 200 can determine its abnormal driving records and vehicle 100's driving records by data collected from sensors outside the cabin. For example, if vehicle 200 detects that vehicle 100 brakes suddenly (e.g., vehicle 100's deceleration is greater than or equal to a preset deceleration) while changing from the right lane to the left lane, and detects that vehicle 100 honks its horn or switches between low and high beams, it can determine that vehicle 100 gave way to vehicle 200 without its consent. After obtaining the user's intention, vehicle 200 can determine that the user's intention includes an expression of apology. Based on vehicle 200's abnormal driving records and vehicle 100's driving records, vehicle 200 can send the user's intention to the cloud server, and also send a message of gratitude (e.g., sending a heart to thank them for giving way). At this time, vehicle 200 can send a voice message to the user, "Okay, I'll tell him now, and I'll also send him a heart for you."
[0128] After receiving the intent and gratitude from the user of Vehicle 200 from the cloud server, Vehicle 100 generates an apology text, "Dear driver, the driver of the car in front apologizes to you, there is an emergency at home," and a gratitude text, "He sent you a heart, thank you for your courtesy."
[0129] Optionally, vehicle 100 can also determine abnormal driving records of vehicle 200 (e.g., vehicle 200 failing to activate its left turn signal when changing lanes to the left) and driving records of vehicle 100 (e.g., vehicle 100 performing sudden braking and honking its horn after sudden braking) based on data collected by sensors outside the cabin. Based on the abnormal driving records of vehicle 200 and the driving records of vehicle 100, vehicle 100 can determine that driver A inside vehicle 100 is currently quite angry, and vehicle 100 can further process this text content. For example, in addition to expressing apologies and gratitude to driver A in vehicle 100, it can add emotionally reassuring text content (e.g., "Don't be angry").
[0130] After the above secondary processing, vehicle 100 can use the text-to-speech (TTS) module to convert the secondary processed text content into voice information, and send a voice message to driver A through the speaker in vehicle 100's cabin: "Dear driver, the driver of the car in front apologizes to you. He has a family emergency. Please don't be angry. He sent you a message of kindness. Thank you for your courtesy."
[0131] In one embodiment, vehicle 200 detects that during a change from the right lane to the left lane, vehicle 100 brakes suddenly (e.g., the deceleration of vehicle 100 is greater than or equal to a preset deceleration) and detects that vehicle 100 honks its horn or switches between low beam and high beam, thus determining that vehicle 100 has unintentionally yielded to vehicle 200. At this time, even without receiving voice information from driver B, vehicle 200 can generate and send to the cloud server vehicle 100's license plate information, an apology text "Apologies to the driver of the following vehicle," and a thank-you text "Thank you for yielding" based on vehicle 200's abnormal driving record and vehicle 100's driving record. The cloud server can find vehicle 100's address based on its license plate information and send the apology and thank-you text to vehicle 100 based on that address. Based on the abnormal driving records of vehicle 200 and the driving records of vehicle 100, vehicle 100 can determine that driver A in vehicle 100 is currently quite angry. Vehicle 100 can also further process this text content. For example, in addition to expressing apologies and gratitude to driver A in vehicle 100, it can add emotionally reassuring text content (e.g., "Don't be angry"). Vehicle 100 can use the speaker in the cabin to issue a voice message to driver A: "Dear driver, the driver of the car in front apologizes to you. There was an emergency at home. Please don't be angry. He showed you kindness and thanked you for yielding."
[0132] As shown in Figure 2B, when driver A is driving vehicle 100 on the road and notices vehicle 200 changing lanes from its current lane to the lane where vehicle 100 is located, driver A will slow down vehicle 100 to give way to vehicle 200 out of courtesy.
[0133] After driver B in vehicle 200 notices vehicle 100 yielding to them, they can issue a voice message saying, "Please thank the driver behind me." Upon receiving this voice message, vehicle 200 can first convert it into text using its ASR (Automatic Speech Recognition) module. Vehicle 200 can then understand the user's true intentions and current emotions based on this text.
[0134] For example, vehicle 200 can determine that the user's intent is "to express gratitude," the desired location is "the driver behind," and the user's current emotion is gratitude. In response to the user's intent and desired location, vehicle 200 can send a voice message to the user, "Okay, I'll tell him now." Simultaneously, vehicle 200 can use data collected by an external camera to determine the license plate number of vehicle 100. Vehicle 200 can then send the user's intent and vehicle 100's license plate information to a cloud server. The cloud server can then determine vehicle 100's address based on its license plate information and, according to that address, send the user's intent to vehicle 100.
[0135] Optionally, vehicle 200 can determine abnormal driving records using data collected by sensors outside the cabin. For example, if vehicle 200 detects that vehicle 100 braked slowly while changing from the right lane to the left lane and no honking or switching between low and high beams is detected, it can be determined that vehicle 100 yielded to vehicle 200. After obtaining the user's intent, vehicle 200 can determine that the user's intent includes expressing gratitude. Based on the abnormal driving records of vehicle 200 and vehicle 100, vehicle 200 can also add other expressions of gratitude (e.g., sending a heart) to the user's intent. At this time, vehicle 200 can send a voice message to the user, "Okay, I'll tell him right away, and I'll also send him a heart for you."
[0136] When vehicle 100 receives the user's intention, it can generate a text message expressing gratitude: "Dear driver, the driver of the car in front thanks you for yielding and sends you a heartfelt message." Vehicle 100 can then use its TTS module to convert this text message into a voice message and transmit it to driver A through the speakers in the vehicle's cabin: "Dear driver, the driver of the car in front thanks you for yielding and sends you a heartfelt message."
[0137] As shown in Figure 2C, when driver A, driving vehicle 100, notices vehicle 200 changing lanes from its current lane to the lane where vehicle 100 is located, driver A will slow down vehicle 100. This may cause driver A to feel dissatisfied with driver B inside vehicle 200's cabin. At this time, driver A will honk the horn of vehicle 100 to express their dissatisfaction.
[0138] Upon noticing vehicle 100 honking its horn, driver B in vehicle 200 sends a voice message: "It's just cutting in line, I'll apologize and that's it." After receiving the voice message, vehicle 200 can first convert it into text using its ASR (Automatic Speech Recognition) module. Vehicle 200 can then understand the user's true intentions and current emotions based on this text.
[0139] For example, the NLU engine might determine the user's intent as "Why didn't you just apologize for cutting in line?" and express the user's emotion as dissatisfaction or annoyance. Vehicle 200 can determine abnormal driving records using data collected by sensors outside the cabin. For instance, if vehicle 200 detects that vehicle 100 braked suddenly (e.g., vehicle 100's deceleration is greater than or equal to a preset deceleration) while changing from the right lane to the left lane, and detects that vehicle 100 honked its horn or switched between low and high beams, it can determine that vehicle 100 did not actively yield to vehicle 200. After obtaining the user's intent, vehicle 200 can determine the slot information as vehicle 100 based on its abnormal driving records and vehicle 100's driving records. Vehicle 200 can also determine vehicle 100's license plate information using images captured by an external camera.
[0140] Optionally, the vehicle 200 can edit the user's intent, for example, by de-emotionalizing the user's intent and transforming dissatisfaction or annoyance into a sincere apology. For example, after de-emotionalization, the user's intent might be "Apologize to the driver of the car behind me for cutting in line."
[0141] Optionally, based on the abnormal driving records of vehicle 200 and the driving records of vehicle 100, vehicle 200 can also add other expressions of gratitude (e.g., sending a heart) to the user's intention. At this time, vehicle 200 can send a voice message to the user: "Dear car owner, we have apologized to the driver of the car behind and sent him a heart."
[0142] After receiving the user's intention and other expressions of gratitude from vehicle 200, vehicle 100 generates a text message expressing thanks: "Dear driver, the driver of the car in front thanks you for yielding and sent you a message of appreciation." Vehicle 100 can then convert this text message into voice information via a TTS module and transmit the corresponding voice message to driver A through the speakers in vehicle 100's cabin.
[0143] For example, Figure 3 shows a schematic diagram of the interaction scenario provided in the embodiment of this application.
[0144] As shown in Figure 3, when driver A is driving vehicle 100 at night and notices an oncoming vehicle 200 with its high beams on, it affects driver A's driving and may cause driver A to feel annoyed. At this point, driver A sends a voice message: "Could you please turn off your high beams?" After receiving the voice message, vehicle 100 can first convert it into text using the ASR (Automatic Speech Recognition) module. Vehicle 200 can then understand the user's true intentions and current emotions based on this text.
[0145] For example, the NLU engine might determine the user's intent as "Turn off your high beams," the context as "oncoming vehicles," and the user's current emotion as dissatisfaction or frustration. After obtaining the user's intent, Vehicle 100 can edit it. For instance, it can de-emotionally process the intent, resulting in "Turn off high beams," or it can add polite language, such as changing the intent to "Please turn off your high beams."
[0146] Optionally, vehicle 100 can also translate the user's intent. For example, the user's intent after de-emotional processing is "turn off the high beams". After translation, the user's true intent can be obtained as "switch the high beams to low beams".
[0147] Simultaneously, vehicle 100 can obtain information about vehicle 200 traveling in the opposite direction based on the slot information. For example, vehicle 100 can determine the license plate information of vehicle 200 using images captured by an external camera. Vehicle 100 can then send the user's true intent and vehicle 200's license plate information to a cloud server. The cloud server can then locate vehicle 200's address based on its license plate information and, according to that address, send the de-emotionalized and translated version of the user's true intent to vehicle 200.
[0148] After receiving the user's true intention, vehicle 200 can transform that intention. For example, vehicle 200 can translate the user's intention into an expression that is easy for driver B to understand and simultaneously provide driver B with effective suggestions.
[0149] For example, when vehicle 200 detects that driver B is a novice driver, it can explain to driver B the reason for turning off the high beams (e.g., affecting oncoming drivers) and provide driving advice (suggesting switching to low beams). For example, vehicle 200 can issue a voice message to driver B: "Dear driver, you are currently using high beams, which may be affecting oncoming drivers. The oncoming drivers would appreciate it if you could switch to low beams."
[0150] For example, when vehicle 200 detects that driver B is an experienced driver, it can give driver B a concise message, such as "Oncoming vehicle, please switch to low beam headlights".
[0151] For example, Figure 4 shows a schematic diagram of the interaction scenario provided in the embodiment of this application.
[0152] As shown in Figure 4, driver A, while driving vehicle 100 in the overtaking lane, notices that vehicle 200 is traveling at a slow speed (e.g., 65 kph). This might cause driver A to feel dissatisfied with driver B inside vehicle 200. At this point, driver A can issue a voice message: "Please tell the car in front of me that if you're driving so slowly, don't occupy the fast lane; you're obstructing others." After receiving the voice message, vehicle 100 can first convert it into text using the ASR module. Vehicle 100 can then understand the user's true intentions and current emotions based on this text content.
[0153] For example, the NLU engine might determine the user's intent as "Relay the message that you're driving so slowly, don't occupy the fast lane," the slot as "the vehicle in front," and the user's current emotion as dissatisfaction or frustration. After obtaining the user's intent, Vehicle 100 can edit it, for example, by de-emotionalizing the intent to obtain a de-emotionalized intent, such as "Relay the message that you are driving too slowly in the fast lane." Simultaneously, Vehicle 100 can send a voice message to the user, "Okay, I've conveyed that to the other party."
[0154] Optionally, vehicle 100 can also translate the user's intent. For example, the user's intent after de-emotional processing is "to convey to the vehicle that the current speed in the fast lane is too slow." After translation, the user's true intent can be obtained as "to convey to the vehicle that the speed in the fast lane should be increased or that the user should switch to the slow lane."
[0155] Vehicle 100 can obtain information about vehicle 200 located in front of it based on the slot information. Vehicle 100 can determine the license plate information of vehicle 200 through data collected by a camera outside the cockpit. Vehicle 100 can send the de-emotionalized user intent and vehicle 200's license plate information to a cloud server. The cloud server can find the address of vehicle 200 based on the license plate information and send the de-emotionalized and translated true user intent to vehicle 200 based on that address.
[0156] After receiving the user's true intention from the cloud server, vehicle 200 can transform that intention. For example, vehicle 200 can transform the user's intention into an expression that is easy for driver B to understand.
[0157] For example, if vehicle 200 detects that the number of times it has traveled on the current road is less than or equal to a preset number, or if vehicle 200 detects that driver B is a novice driver, vehicle 200 can inform driver B about the speed limit information of the current fast lane and translate the user's intention into a way that is easy for driver B to understand. For example, vehicle 200 can issue a voice message to driver B: "Dear driver, Xiao A reminds you that you are currently in the fast lane, the speed limit is 80-100 km / h, and your current speed is 65 km / h. The driver behind you is a bit impatient, please speed up."
[0158] For example, if vehicle 200 detects that it has traveled on the current road more times than a preset number of times, or if vehicle 200 detects that driver B is an experienced driver, vehicle 200 can translate the user's intention into a way that driver B can easily understand. For example, it can issue a more concise expression to driver B, such as the voice message "The driver behind you would like you to speed up or change lanes."
[0159] For example, Figure 5 shows a schematic diagram of the interaction scenario provided in the embodiment of this application.
[0160] As shown in Figure 5, driver A, while driving vehicle 100 in their designated lane, notices vehicle 200 driving over the lane lines. This might cause driver A to feel dissatisfied with driver B inside vehicle 200. At this point, driver A can issue a voice message: "What's wrong with this person? How can they drive like that?" After receiving the voice message, vehicle 100 can first convert it into text using its ASR (Automatic Responsiveness) module. Vehicle 100 can then understand the user's true intentions and current emotions based on this text.
[0161] For example, the NLU engine might determine the user's intent as "complaining about abnormal driving behavior" and their current emotion as either complaining or angry. After obtaining the user's intent and current emotion, Vehicle 100 can issue calming voice messages (e.g., "Don't be angry, it's not worth getting upset") and recommend actions to the user to soothe their emotions (e.g., asking if they want to play soft music or turn on the air freshener). For instance, Vehicle 100 could use the cabin speakers to issue the voice message, "Dear owner, don't be angry, it's not worth getting upset. I can play some soft music for you, how about that?"
[0162] Optionally, vehicle 100 can determine information about vehicle 200 with abnormal driving records based on the user's intent and data collected by sensors outside the cabin. For example, vehicle 100 may determine that vehicle 200 is driving across lane lines and obtain vehicle 200's license plate information based on sensor data. Vehicle 100 can then translate the user's intent based on this abnormal driving record. For example, the translated user's true intent might be "Please ask the driver of the vehicle in front to control their vehicle and stay within the lane." Vehicle 100 can then send the user's true intent and vehicle 200's license plate information to a cloud server. The cloud server can then determine vehicle 200's address based on its license plate information and send the user's true intent to vehicle 200 based on that address.
[0163] Vehicle 200 can generate an expression that is easy for driver B to accept after receiving the true intent of the user of vehicle 100 from the cloud server.
[0164] For example, when vehicle 200 detects that driver B is a novice driver, it can inform driver B of the current abnormal driving behavior and provide driving suggestions. For example, vehicle 200 can issue a voice message to driver B: "Dear driver, you are currently driving over the lane lines, which may affect oncoming drivers. Please drive your vehicle in the lane as soon as possible."
[0165] For example, when vehicle 200 detects that driver B is an experienced driver, it can issue a concise expression to driver B, such as issuing a voice message "Caution, driving over the line".
[0166] For example, Figures 6A and 6B show schematic diagrams of the interactive scenarios provided in the embodiments of this application.
[0167] As shown in Figure 6A, vehicle 200 is in intelligent driving mode and is making a U-turn ahead. Vehicle 200 can determine that vehicle 100 is behind it and obtain vehicle 100's license plate information by using data collected from external sensors (e.g., a camera located at the rear of vehicle 200). At this time, vehicle 200 can send vehicle 100's license plate information and indication information 1 to the cloud server. Indication information 1 indicates that vehicle 200 is in intelligent driving mode and is about to make a U-turn ahead. The cloud server can determine vehicle 100's address based on vehicle 100's license plate information and send indication information 1 to vehicle 100 based on that address.
[0168] After receiving instruction information 1 from vehicle 200 sent by the cloud server, vehicle 100 can generate a prompt message. This prompt message can include the status of vehicle 200 and the content to remind driver A. For example, vehicle 100 can use a TTS module to convert the prompt message into voice information and play the voice message "Dear driver, the vehicle ahead is in intelligent driving mode and wants to make a U-turn. Please be careful."
[0169] As shown in Figure 6B, vehicle 100 is in intelligent driving mode. When the driver in vehicle 200 notices an abnormal driving trajectory of vehicle 100, they issue a voice message, "How is this car being driven?" and express their dissatisfaction by honking the horn. Vehicle 200 can determine the abnormal driving trajectory of vehicle 100 (e.g., excessive lateral movement) based on its own information and the driving information of surrounding vehicles. Vehicle 200 can then obtain the user's intent, "Please, vehicle ahead, stop moving laterally." Vehicle 200 can send the user's intent and vehicle 100's license plate information to a cloud server. The cloud server can determine the address of vehicle 100 based on its license plate information and send the user's intent to vehicle 100 based on that address.
[0170] In response to receiving the user's intent, vehicle 100 can determine that vehicle 200 is behind vehicle 100 and obtain vehicle 200's license plate information based on data collected by external sensors (e.g., a camera located at the rear of vehicle 200) and the user's intent. At this time, vehicle 100 can send vehicle 200's license plate information and instruction information 2 to the cloud server. Instruction information 2 indicates that vehicle 100 is in intelligent driving mode. The cloud server can determine vehicle 200's address based on its license plate information and send instruction information 2 to vehicle 200 based on that address. Instruction information 2 includes an indication that vehicle 100 is in intelligent driving mode and an apology message.
[0171] After receiving instruction information 2 from vehicle 100 sent by the cloud server, vehicle 200 can generate a prompt message. This prompt message can include the status of vehicle 100 and content to remind the driver. For example, vehicle 200 can use a TTS module to convert the prompt message into voice information and play the voice message "Dear driver, the vehicle ahead is in intelligent driving mode, please excuse us" through the speakers in the cabin.
[0172] The above embodiments illustrate the analysis of user intent and slot information using the ASR and NLU modules, but this application does not impose specific limitations on these embodiments. For example, user input (e.g., voice input), vehicle information, and environmental information surrounding the vehicle can be input into the inference model (or, referred to as a large model or multimodal model) to output information about another vehicle (e.g., license plate information) and information to be conveyed to that other vehicle. The vehicle can then send this information, along with the information to be conveyed to that other vehicle, to a cloud server. The cloud server can then use this information to send the conveyed information back to that other vehicle.
[0173] For example, Figure 7 shows a schematic diagram of the interaction scenario provided in an embodiment of this application.
[0174] As shown in Figure 7, when driver A, driving vehicle 100, notices vehicle 200 changing lanes from its current lane to the lane where vehicle 100 is located, driver A will slow down vehicle 100. This may cause driver A to feel dissatisfied with driver B inside vehicle 200's cabin. At this time, driver A will honk the horn of vehicle 100 to express their dissatisfaction.
[0175] After detecting that vehicle 100 has honked its horn, vehicle 200 can determine the surrounding environmental information through data collected by sensors. For example, vehicle 200 can determine from the surrounding environmental information that the driver has decided to change lanes to the left due to road construction ahead. After determining that vehicles in the left and right lanes need to take turns passing in the road construction scenario, vehicle 200 can issue a voice message through the speaker in the cabin: "The driver who honked may not know about the road construction, I will inform him."
[0176] Vehicle 200 can determine the license plate information of vehicle 100 through images captured by an external camera. Vehicle 200 can then send the license plate information and instruction information 3 to a cloud server. This instruction information 3 indicates that the lane currently occupied by vehicle 200 is under construction.
[0177] After receiving instruction information 3 from vehicle 200 sent by the cloud server, vehicle 100 can determine, based on instruction information 3, abnormal driving records of vehicle 200, and driving records of vehicle 100, whether driver A's honking might be due to unawareness of road construction in the right lane, or driver A might be aware of the right lane construction but unaware that alternating lanes are required in this scenario. Vehicle 100 can then convey driving advice to the user in this situation, for example, by issuing a voice message through the cabin speaker: "Dear driver, the right lane is currently under construction. Alternating lanes are required. Please don't rush!"
[0178] Optionally, vehicle 100 can generate an expression that is easy for driver A to accept based on the information of driver A.
[0179] For example, when vehicle 100 detects that driver A is a novice driver, it can inform driver A that vehicle 200 is not driving abnormally and provide driving suggestions. For example, vehicle 100 can issue a voice message to driver A: "Dear driver, the right lane is currently under construction and requires alternating traffic. Please don't rush!"
[0180] For example, when vehicle 200 detects that driver B is an experienced driver, it can issue a simpler expression to driver A, such as issuing a voice message "Attention, road construction is underway on the right."
[0181] For example, Figures 8A-8C show schematic diagrams of the interactive scenarios provided in the embodiments of this application.
[0182] As shown in Figure 8A, when a user drives vehicle 100 to a narrow passage, they find that a vehicle in front is stopped in the narrow passage, preventing vehicle 100 from passing. At this time, the driver of vehicle 100 can issue a voice message, "Could you please tell the person in front to move their car? How are other people supposed to pass?" When vehicle 100 collects this voice message, it can input the voice message, vehicle 100 information (e.g., vehicle speed, road information), and environmental information outside the vehicle 100's cabin into the inference model, thereby obtaining the inference result. The inference result can include the voice message "Okay, I've already communicated with the other party" relayed to the driver of vehicle 100, vehicle 200's license plate information, and the first message conveyed to vehicle 200 (e.g., text content "Your car seems to be blocking the car behind, and the driver behind is in a hurry to pass").
[0183] In this embodiment of the application, the input to the inference model may include one or more of the following: voice, images, text, and video streams (images and video streams may be images or video streams collected by sensors inside and outside the cockpit).
[0184] Optionally, the inference model can be supplemented with external knowledge (e.g., road regulations, road conditions, best practices). This allows a general multimodal model to become a user-specific multimodal model when driving a vehicle; the output of the inference model can be the user's intent and information from another vehicle.
[0185] Depending on the current scenario, the inference model can search for external knowledge (e.g., playing a racing game in a car, or a real-world driving scenario) to search for different types of knowledge. For example, if a user is playing a game on the passenger-side screen and gives the voice command "That car in front is so slow," the inference model can input this voice message and the status of the passenger-side screen to output the text or voice message "Pass it!" As another example, if a user is driving in an urban area and the car in front is slow, honking their horn and giving the voice command "That car in front is so slow," the inference model can input this voice message, the vehicle's current speed, lane information, and surrounding environmental information to output the voice message "The speed limit on this road is 50-70 km / h. The car in front is driving normally, please don't rush!"
[0186] The inference model can also output execution information of other devices inside the vehicle (e.g., tone, pitch, animation, color, etc.). For example, when outputting the voice message "The speed limit on this road is 50-70 km / h. The car in front is driving normally. Please don't rush!", it can also output a command to control the sound-emitting device to play light music.
[0187] Vehicle 100 can send the license plate information of vehicle 200 and the initial message to vehicle 200 to the cloud server. The cloud server can determine, based on the license plate information of vehicle 200 and the stored association between license plate information and accounts, that the account corresponding to the license plate information of vehicle 200 includes multiple devices, such as vehicle 200, mobile phones, smartwatches, and tablets. The cloud server can then send the initial message to multiple devices under that account.
[0188] As shown in Figure 8B, when the mobile phone under the account logged into vehicle 200 receives the first message, it can display a prompt box on the phone's screen. This prompt box includes the message "Dear driver, your vehicle seems to be blocking oncoming traffic. The driver behind is in a hurry to pass. Do you want to remotely control the vehicle?", a confirmation control, and a cancellation control. When the user clicks the confirmation control, the phone can display environmental information and a prompt box around vehicle 200. This environmental information includes a dashed box, which indicates the suggested area to move to. The prompt box includes the message "We suggest you move your vehicle to the dashed box to facilitate the passage of oncoming traffic", a confirmation control, and a cancellation control. When the user clicks the confirmation control, the phone can send a control command to the vehicle, instructing vehicle 200 to move from its current position to the area within the dashed box. The phone can also display the process of vehicle 200 moving towards the dashed box. When the phone detects that the user sends the voice message "Please apologize to the driver behind me," the phone can input this voice message into the inference model, thus obtaining the text message to the driver behind: "The car has been moved and I apologize to you." The cloud server can forward the text content to vehicle 100. Vehicle 100 can input the text content, vehicle information, and environmental information around the vehicle into the inference model, thereby obtaining the voice message to be broadcast to the driver of vehicle 100: "Dear car owner, the driver of the car in front has moved the car and apologized to you for the delay."
[0189] As shown in Figure 8C, vehicle 100 can control the speaker to emit a voice message: "Dear car owner, the driver of the car in front has moved the car and apologized to you for the delay."
[0190] Figure 9 shows a schematic flowchart of the interaction method 900 provided in an embodiment of this application. The method 900 includes:
[0191] S910, obtain the first intention of a user in the vehicle cabin, the first intention being associated with another vehicle and the first intention being associated with the user's negative emotions.
[0192] For example, the user's initial intent could be obtained through voice command analysis, such as the voice message in Figure 3: "Could the car on the other side turn off its high beams?" Or, as shown in Figure 4: "Could you tell the car in front of you to stop driving so slowly and not occupy the fast lane, obstructing other drivers?"
[0193] For example, the user's initial intent can be obtained through or based on the vehicle's driving record (e.g., the driver of the vehicle frequently switching between low beam and high beam, sudden braking, sudden steering wheel turning) and the driving record of the other vehicle (the other vehicle is currently using high beam).
[0194] For example, the user's initial intent could be determined by the vehicle's driving record (honking in the fast lane) and the driving record of the other vehicle (e.g., driving too slowly in the fast lane).
[0195] For example, a user's initial intent can also be determined through their actions (e.g., body language) and facial expressions (e.g., annoyance).
[0196] For example, the user's initial intent can also be determined by using infrared sensors and thermal imaging sensors to determine facial information (e.g., the face turning red).
[0197] For example, even if the driver's emotions are not outwardly expressed, the initial intention can still be obtained through the emotions of other passengers in the cabin. For instance, another passenger might send a voice message like, "Could that car over there turn off its high beams?"
[0198] For example, this primary intent can also be obtained through information such as heart rate or blood pressure detected by the user's watch.
[0199] For example, this initial intention can also be determined by the grip strength of the steering wheel.
[0200] S920, the first intention is de-emotionally processed to obtain the second intention.
[0201] For example, after determining the user's primary intent, the primary intent can be de-emotionalized to obtain a de-emotionalized secondary intent.
[0202] S930, the second intention is sent to the other vehicle.
[0203] In this embodiment, when a user sends an intention with negative emotions, these emotions are de-emotionally processed to obtain a second intention, which is then sent to another vehicle. By processing the user's intention a second time, negative emotions are not transmitted. This achieves effective and friendly communication, conveying valid information, even in cases of abnormal interactions between vehicles.
[0204] Optionally, the step of de-emotionalizing the first intention to obtain the second intention includes: de-emotionalizing the first intention to obtain a de-emotionalized intention; and adding polite language to the de-emotionalized intention to obtain the second intention.
[0205] In this embodiment, polite language can be added to the intent, thereby further achieving the goal of effective and friendly communication, avoiding friction between drivers, and preventing traffic accidents. Furthermore, adjusting the language style is more conducive to communication.
[0206] In this embodiment, user intent can be identified using an ASR module and an NLU module. When the NLU identifies the user's intent, de-emotionalization processing can be performed.
[0207] Optionally, before obtaining the first intent of the user in the vehicle cabin, the method further includes: obtaining the user's first voice command; determining the first intent based on the first voice command; wherein, before sending the second intent to the other vehicle, the method further includes: when the first voice command does not include slot information corresponding to the first intent, determining the information of the other vehicle based on the first intent and data collected by sensors outside the vehicle cabin; or, when the first voice command includes slot information corresponding to the first intent, determining the information of the other vehicle based on the first slot information and data collected by the sensors.
[0208] In this embodiment of the application, after obtaining the user's intent through voice command, if slot information is included, the information of another vehicle (which may be at least one of license plate information, body color, and brand) can be determined based on the slot information and the driving record of another vehicle; if slot information is not included, the information of another vehicle (which may be at least one of license plate information, body color, and brand) can be determined based on the user's intent and the data collected by the sensor.
[0209] In this embodiment of the application, the identification of the user's intent can also be achieved through an inference model (e.g., a multimodal large model).
[0210] Optionally, before obtaining the first intent of the user in the vehicle cabin, the method further includes: obtaining a second voice command from the user, the second voice command including the first intent; wherein, performing de-emotional processing on the first intent to obtain the second intent includes: inputting the second voice command, the vehicle information, and the vehicle's surrounding environment information into an inference model to obtain the second intent and the information of the other vehicle.
[0211] Optionally, obtaining the first intention of a user inside the vehicle cabin includes: obtaining the user's driving behavior and the driving record of the other vehicle; determining the first intention based on the user's driving behavior and the driving record of the other vehicle; wherein, before sending the second intention to the other vehicle, the method further includes: determining the information of the other vehicle based on the first intention and data collected by sensors outside the vehicle cabin.
[0212] In this embodiment, the user's intent can be determined based on the user's driving behavior inside the vehicle and the driving record of another vehicle. For example, if the vehicle detects that the user is pressing the horn for a long time or frequently switching between low beam and high beam, it can determine that the other vehicle is exhibiting abnormal driving behavior, thereby determining the user's intent (which is generally to instruct the other vehicle to remove the abnormal driving behavior).
[0213] Optionally, before sending the second intent to the other vehicle, the method further includes: determining that the second intent complies with road regulations.
[0214] In this embodiment of the application, when it is determined that the first intention satisfies the road regulations, such as the scenario of road construction in Figure 7, the user's desire for other vehicles not to cut in is actually not in accordance with the road regulations (in the scenario of road construction, vehicles should take turns passing).
[0215] The above can also be used to determine whether a user's voice command is valid. If valid, the inference model can determine how to communicate with the other party and provide an output. Determining validity can be considered part of the internal implementation process of the inference model.
[0216] Optionally, sending the second intention to the other vehicle includes: sending the second intention to the other vehicle based on the signal strength and / or signal quality of the environment in which the vehicle is located.
[0217] Optionally, sending the second intention to the other vehicle based on the signal strength and / or signal quality of the environment in which the vehicle is located includes: when the signal strength is greater than or equal to a preset signal strength, and / or the signal quality is greater than or equal to a preset signal quality, sending the information of the other vehicle and the second intention to the cloud server, so that the cloud server sends the second intention to the other vehicle based on the information of the other vehicle.
[0218] Optionally, sending the second intention to the other vehicle includes: sending the second intention to the other vehicle via near-field communication when the signal strength is less than a preset signal strength and / or the signal quality is less than a preset signal quality.
[0219] Figure 10 shows a schematic flowchart of the interaction method 1000 provided in an embodiment of this application. The method 1000 includes:
[0220] S1010, Receive a second intent from the vehicle, the second intent being an intent obtained after de-emotionalizing the first intent of the user in the vehicle cabin.
[0221] S1020, the control prompting device prompts the user with the second intention.
[0222] Optionally, the control prompting device prompts the user with the second intention, including: determining a first driving opinion based on the second intention; and controlling the prompting device to prompt the user with the second intention and the first driving opinion.
[0223] For example, the receiving device can adjust at least one of the voice, tone, or manner of speaking when sending the voice message based on the user's state. For instance, if the driver is currently irritable, the second intention can be conveyed humorously to ease the tension. Or, if the driver is currently in a good mood, the second intention can be stated directly.
[0224] Optionally, the control prompting device prompts the user for the second intention, including: prompting the user for the second intention based on the user's status in the driver's area.
[0225] Optionally, the second intent indicates abnormal driving behavior, and the method further includes: controlling the vehicle lights and / or external projection information to display first information, the first information being used to apologize to and / or thank the user inside the vehicle.
[0226] Figure 11 shows a schematic flowchart of the interaction method 1100 provided in an embodiment of this application. The method 1100 includes:
[0227] S1110, Obtain the first input from the user in the vehicle cabin, the vehicle information, and the environmental information around the vehicle.
[0228] For example, the first input can be voice messages from the user, user actions, or user facial expressions. The user here can be a user in the driver's area, or a user in another area.
[0229] For example, vehicle information may include one or more of the following: vehicle speed, vehicle actuator operation (e.g., frequent switching between low beam and high beam, emergency braking, excessive rate of change of steering wheel angle, honking, grip strength on the steering wheel, etc.), location, type of lane, type of road, and data collected by sensors in the cabin (e.g., data collected by infrared sensors and thermal imaging sensors).
[0230] For example, environmental information around the vehicle includes data collected by sensors outside the vehicle cabin, such as images and video streams captured by cameras outside the cabin.
[0231] Optionally, the first input may also include input from the user's electronic device (e.g., a mobile phone, smartwatch, or smart bracelet). For example, heart rate or blood pressure detected by the user's watch.
[0232] S1120, determine the first output based on the first input, the vehicle information, and the environmental information.
[0233] Optionally, the first output may include information (e.g., text content or voice information) to be conveyed to another vehicle.
[0234] Optionally, the method further includes: outputting control commands for actuators in the vehicle cabin (e.g., air conditioning, ambient lighting, fragrance, or in-vehicle speakers) based on the first input, the vehicle information, and the environmental information.
[0235] Optionally, determining the first output based on the first input, the vehicle information, and the environmental information includes: inputting the first input, the vehicle information, and the environmental information into the inference model to obtain the first output.
[0236] Optionally, the vehicle information includes at least one of the vehicle's speed, the type of road the vehicle is on, and the type of lane the vehicle is in.
[0237] S1130, send the first output to the other vehicle.
[0238] Optionally, the environmental information includes data collected by the vehicle's sensors.
[0239] This application also provides an interactive system. The interactive system includes a transmitting device and a receiving device. The transmitting device is configured to acquire a first intention of a user in a vehicle cabin, the first intention being associated with another vehicle and related to the user's negative emotions; the transmitting device is further configured to perform de-emotional processing on the first intention to obtain a second intention; the transmitting device is further configured to send the second intention to the other vehicle; and the receiving device is configured to control a prompting device to prompt the user with the second intention.
[0240] Figures 12 and 13 respectively show schematic diagrams of the system architecture provided in the embodiments of this application.
[0241] In this embodiment, external sensors (e.g., cameras) can identify the appearance information of nearby vehicles (e.g., color, brand, model), license plate numbers, relative positional relationships between vehicles, and current driving environment information (congestion, intersections, side roads). An external microphone can identify the horn characteristics of other vehicles (whether they honk, the number of horns, and the urgency), and identify surrounding environmental information (whether they honk). Through cameras, vehicle sensors, and an Autopilot System (ADS), the driving status and abnormal driving behavior of the driver's own vehicle or other vehicles can be identified, providing reference information to the driver.
[0242] Through an intelligent engine system, aided by a large model, the system can: understand the ambiguous semantics of user descriptions and map them to a specific nearby vehicle; effectively process the user's expression by combining it with current dynamic and static environmental information, the user's current state, and general handling procedures (knowledge) for the current situation, resulting in accurate, rational, polite, and appropriate communication; provide suitable feedback and suggestions to the user based on current state information, and assist the user in performing actions such as editing light signals, expressing gratitude, or apologizing.
[0243] Information is exchanged via cloud servers through information matching or mobile networks, or via near-field communication. This type of information exchange can protect the driver's privacy by not exposing their personal information; it also reduces ambiguity and improves communication efficiency and safety by eliminating the need for non-verbal indirect expressions (flashing lights, honking horns).
[0244] Based on natural language descriptions, the system maps information to nearby vehicles and exchanges information, breaking down information barriers between drivers, enabling communication, and effectively managing emotions.
[0245] In this embodiment, the system can process and filter the information expressed by the driver based on the driver's current state, environmental conditions, driving behavior, and other information. After extracting key information, it can perform secondary processing and summarization for interactive transmission. It can also fulfill a series of explicit expressions of intent on behalf of the driver, such as gratitude, apology, and opinions.
[0246] In this embodiment, vehicle information is located based on information described in natural language to achieve the effect of establishing a connection (license plate, VIN number, etc. can be used as ID).
[0247] In this embodiment, the interactive information is processed again based on the status information (person, vehicle, environment) to achieve effective and friendly communication, and to convey effective information without conveying negative emotions.
[0248] In this embodiment of the application, vehicles can perform point-to-point interaction or broadcast interaction.
[0249] In this embodiment, other vehicle hardware, such as headlights and projectors, can be used to transmit secondary language information.
[0250] In this embodiment, the projection includes DLP projection onto the ground and pedestrian interaction, or it may include projection onto a vehicle window and surrounding interaction.
[0251] In this embodiment of the application, vehicle-to-vehicle interaction may include mobile network interaction as well as near-field interaction (e.g., via star flash).
[0252] This application also provides an interactive device, which includes a module or unit for performing the above-described interactive method.
[0253] Figure 14 shows a schematic flowchart of an interaction method 1400 provided in an embodiment of this application. The method 1400 includes:
[0254] S1410, the first vehicle acquires the first input from the user in the first vehicle's cabin and the environmental information around the first vehicle, including information about the second vehicle.
[0255] For example, the first input may include the user's voice input.
[0256] For example, taking the first vehicle as vehicle 100 in Figure 3 and the second vehicle as vehicle 200 in Figure 3, the first input can be the voice input of driver A: "Could the car on the other side turn off its high beams?"
[0257] For example, taking the first vehicle as vehicle 100 in Figure 4 and the second vehicle as vehicle 200 in Figure 4, the first input can be the voice input of driver A: "Please tell the car in front of me that if you're driving so slowly, don't occupy the fast lane and you're obstructing others."
[0258] For example, taking the first vehicle as vehicle 100 in Figure 5 and the second vehicle as vehicle 200 in Figure 5, the first input can be the voice input of driver A, "What's wrong with this person? How is he driving?"
[0259] For example, the first input may include user input to one or more components in the vehicle. For example, the one or more components may include, but are not limited to, a steering wheel, lights, horn, accelerator pedal, or brake pedal.
[0260] For example, taking vehicle 100 in Figure 2A as the first vehicle and vehicle 200 in Figure 2A as the second vehicle, the first input can be the horn input of driver A.
[0261] For example, taking vehicle 200 in Figure 6B as the first vehicle and vehicle 100 in Figure 6B as the second vehicle, the first input can be the driver's horn input in vehicle 200.
[0262] For example, the first input may include the user's physiological characteristics information. For example, the physiological characteristics information may include the user's blood pressure, heart rate, facial expressions, etc.
[0263] For example, consider a user's facial expression. The vehicle can acquire image data from cameras inside the cabin and determine the user's facial expression based on that image data. For example, the user's facial expression might be a frowning expression.
[0264] For example, the first input may include the user's limb input. For instance, the limb input may be a gesture input.
[0265] For example, the first input can be data collected by a user's wearable device (e.g., a smartwatch, smart bracelet, etc.) inside the cabin.
[0266] For example, the first input can be input from one or more users in the cockpit.
[0267] Optionally, the environmental information includes data collected by sensors outside the cockpit.
[0268] For example, external sensors include, but are not limited to, one or more of cameras, lidar, millimeter-wave radar, and microphones.
[0269] S1420, the first vehicle determines the first output based on the first input and environmental information.
[0270] Optionally, before the first vehicle determines the first output based on the first input and environmental information, the method 1400 further includes: determining the negative emotions of the user associated with the first input.
[0271] Optionally, determining the negative emotions associated with the first input includes: determining that the first input meets preset conditions.
[0272] For example, taking voice input as the first input, the preset condition includes that the text content corresponding to the voice input contains target semantics, which includes semantics related to negative emotions such as complaining, anger, impolite language, and modal particles. For example, target semantics related to complaining include "Why drive like that?" and "Obstructing others." For example, semantics related to modal particles include "ah".
[0273] For example, taking the first input as the user's input to the horn, the preset condition includes the duration for which the user presses the horn being greater than or equal to a preset duration.
[0274] For example, taking the first input as the user's input to the steering wheel, the preset condition includes the user's grip force on the steering wheel being greater than or equal to a preset grip force; or, the preset condition includes the rate of change of the steering angle of the steering wheel being greater than or equal to a preset rate of change within a preset time period.
[0275] For example, taking the user's facial expression as the first input, the preset conditions include the user's facial expression being a frown, an angry expression, or an angry expression.
[0276] For example, taking the user's heart rate as the first input, the preset condition includes that the user's heart rate is greater than or equal to a preset heart rate.
[0277] Optionally, the first input is associated with the user's negative emotions, and the first output includes the de-emotionalized output.
[0278] In this embodiment, when a user's input is associated with negative emotions, the information that needs to be interacted with by the second vehicle can be further processed based on the user's input and environmental information, thereby achieving de-emotional processing. This enables effective and friendly communication between vehicles, conveying effective information without transmitting negative emotions, and helps avoid conflicts between vehicles caused by the driver's negative emotions.
[0279] Optionally, the first output includes impolite language, and the first output includes polite language converted from the impolite language.
[0280] Based on the above technical solution, while performing de-emotional processing, the vehicle can also convert impolite language used by the user and send the converted polite language to the second vehicle. This helps to further achieve effective and friendly communication between vehicles, conveying effective information without transmitting negative emotions, and helps reduce the probability of conflicts between drivers.
[0281] Optionally, the first input is associated with the user's negative emotions, and the method 1400 further includes: controlling the prompting device to output a third output, the third output including output results for reassuring the user in the cabin.
[0282] Based on the above technical solution, when the first input is associated with the user's negative emotions, the system can also use this first input and environmental information to provide a soothing output to the user in the cabin. In this way, while achieving friendly interaction between vehicles, it can also soothe the user in the cabin, helping to alleviate negative emotions and improve the user's driving experience.
[0283] For example, the third output may include reassuring phrases.
[0284] For example, the first input can be the voice input of driver A as shown in Figure 4, and the third output can be voice output 1 "Okay, I have expressed it to the other party" and voice output 2 "The driver in front may be a novice driver".
[0285] For example, the third output may include execution instructions for one or more components within the cockpit.
[0286] Optionally, determining the first output based on the first input and environmental information includes: inputting the first input and environmental information into the first inference model to obtain the first output.
[0287] Based on the above technical solution, the first input and environmental information can be input into the first inference model to obtain the first output. Thus, end-to-end input and output can be achieved through the first inference model.
[0288] For example, the first inference model can be a multimodal model.
[0289] For example, taking the scenario shown in Figure 3 as an example, the voice input of driver A, "Could the car on the other side turn off its high beams?", and the image data (e.g., pictures or video streams) collected by the camera outside the cabin of vehicle 100 can be input into the first inference model. The first inference model can output the user's true intention, "Please switch the high beams to low beams."
[0290] For example, taking the scenario shown in Figure 4 as an example, the voice input of driver A, "Please tell the car in front of me not to drive so slowly and occupy the fast lane, it's obstructing others," and image data (e.g., pictures or video streams) collected by the camera outside the cabin of vehicle 100 can be input into the first inference model. The first inference model can then output the user's true intention, "Please speed up in the fast lane."
[0291] For example, taking the scenario shown in Figure 5 as an example, the voice input of driver A, "What's wrong with this person? How is he driving?", and image data (e.g., pictures or video streams) collected by the camera outside the cabin of vehicle 100 can be input into the first inference model. The first inference model can output the user's true intention, "Please do not drive over the lane lines."
[0292] Optionally, method 1400 further includes: obtaining information about a first vehicle; wherein the first vehicle determines a first output based on a first input and environmental information, including: the first vehicle determines the first output based on the first input, information about the first vehicle, and environmental information.
[0293] Based on the above technical solution, by combining user input within the vehicle cabin, vehicle information, and environmental information surrounding the vehicle, a first output can be determined, which can then be sent to a second vehicle or terminal device. This breaks down information isolation between drivers in the two vehicles, enabling vehicle-to-vehicle interaction. Furthermore, by incorporating vehicle information, the accuracy of the first output result can be further improved.
[0294] For example, the vehicle information includes the vehicle's historical driving records.
[0295] For example, the vehicle information may include one or more of the vehicle's speed, acceleration, and location.
[0296] For example, taking the scenario shown in Figure 3 as an example, the second inference model can be input into the second inference model, which can include the voice input of driver A, "Could the car on the other side turn off its high beams?", the lighting status of vehicle 100 (vehicle 100 currently has its low beams on, or the driver is frequently switching between low beams and high beams), and the image data collected by the camera outside the cabin of vehicle 100. The second inference model can then output the user's true intention, "Please switch the high beams to low beams".
[0297] For example, taking the scenario shown in Figure 4 as an example, the second inference model can be input into the second inference model, which can be the voice input of driver A, "Please tell the car in front of me that you are not driving so slowly and occupying the fast lane, which is blocking other people", the speed of vehicle 100 (e.g., 90 km / h) and the image data collected by the camera outside the cabin of vehicle 100. The second inference model can output the user's true intention, "Please speed up in the fast lane".
[0298] Optionally, the first vehicle determines the first output based on the first input, the information of the first vehicle, and the environmental information, including: inputting the first input, the information of the first vehicle, and the environmental information into the second inference model to obtain the first output.
[0299] Based on the above technical solution, the first input, the information of the first vehicle, and the environmental information can be input into the first inference model to obtain the first output. Thus, end-to-end input and output can be achieved through the second inference model.
[0300] Optionally, the first inference model and the second inference model can be the same model.
[0301] Taking the first and second inference models as an example, which are the same multimodal model, the input to this multimodal model includes, but is not limited to, one or more of the following: speech, images, text content, or video streams.
[0302] This multimodal large model can be trained from a general multimodal large model. For example, this multimodal large model includes a knowledge base. The knowledge base includes, but is not limited to, road regulations of one or more countries (or regions), speed limit information for road sections, and reasonable handling methods during vehicle interactions. This allows the general multimodal large model to become a dedicated multimodal large model for the vehicle interaction domain. The output of the multimodal large model can be one or more of the following: the user's true intention after de-emotionalization, the converted polite language, and the translation result of the user's true intention.
[0303] Optionally, method 1400 further includes: the first vehicle determining information about the second vehicle based on the first input and environmental information.
[0304] Based on the above technical solution, the information of the second vehicle can be determined by the user's input and environmental information in the first vehicle's cabin, thereby enabling the first output to be sent to the second vehicle or a terminal device. In this way, the accuracy of the identified vehicle for interaction can be ensured by combining user input and environmental information when determining the information of the second vehicle.
[0305] Optionally, the first vehicle determines the information of the second vehicle based on the first input and environmental information, including: the first vehicle inputs the first input and environmental information into a third inference model to obtain the information of the second vehicle.
[0306] For example, taking the scenario shown in Figure 3, the second inference model can be input into the following data: driver A's voice input "Could the car on the other side turn off its high beams?", the headlight status of vehicle 100 (vehicle 100 currently has its low beams on, or the driver is frequently switching between low and high beams), and image data collected by a camera outside the cabin of vehicle 100. This second inference model can then output the user's true intent, "Please switch the high beams to low beams," and the license plate information of vehicle 200.
[0307] For example, taking the scenario shown in Figure 4 as an example, the second inference model can be input into the second inference model, which can include the voice input of driver A, "Please tell the car in front of me not to drive so slowly and occupy the fast lane, it's obstructing others," the speed of vehicle 100 (e.g., 90 km / h), and the image data captured by the camera outside the cabin of vehicle 100. The second inference model can then output the user's true intention, "Please speed up in the fast lane," and the license plate information of vehicle 200.
[0308] Optionally, the first inference model, the second inference model, and the third inference model can be the same model.
[0309] Taking the first, second, and third inference models as an example of a single multimodal model, the output of the large multimodal model can also include information about the second vehicle. For example, the information about the second vehicle might include its license plate information.
[0310] Optionally, the first inference model and the third inference model may not be the same model.
[0311] For example, this third inference model can be implemented using a semantic recognition algorithm and an image segmentation algorithm. For example, when the first vehicle receives the user's voice input, "Could the red car in front go faster?", it can acquire image 1 captured by a camera outside the cockpit. Using a semantic recognition algorithm, the text content related to the target's attributes in the voice input (e.g., "red car") can be determined. The first vehicle can extract ROI1 and ROI2 from image 1 based on an image segmentation algorithm. ROI1 and ROI2 each include the red car. For example, image 1 can be divided into multiple regions; ROI1 can be region a within these regions, and ROI2 can be region b within these regions. If the user's gaze points to region a when the voice input is triggered, ROI1 can be selected as the target ROI. The first vehicle can send this target ROI to a cloud server. The cloud server can then analyze this target ROI to obtain the license plate information of the second vehicle.
[0312] Optionally, method 1400 further includes: the first vehicle determining a first control command based on the first input and environmental information, the first control command being associated with one or more actuators; the first vehicle controlling one or more actuators to execute the first control command.
[0313] For example, taking vehicle 100 in Figure 2A as the first vehicle and vehicle 200 in Figure 2A as the second vehicle, the first input can be a horn input from driver A. The first vehicle can generate control commands for the sound-emitting device and ambient lighting based on the horn input and image data collected by sensors outside the cabin. For example, the control command for the sound-emitting device instructs it to play music that alleviates dissatisfaction (e.g., soothing music). For example, the control command for the ambient lighting includes controlling the ambient lighting color to be warm.
[0314] Based on the above technical solution, control commands can be obtained through the first input and environmental information. In this way, by executing control commands on one or more actuators, it is possible to alleviate the user's anger, resentment, or impatience.
[0315] S1430, the first vehicle sends a first output to the second vehicle or a terminal device, and the terminal device is associated with the second vehicle.
[0316] Optionally, the information of the second vehicle includes the license plate information of the second vehicle; wherein, the first vehicle sending the first output to the second vehicle or the terminal device includes: the first vehicle sending the first output and the license plate information of the second vehicle to the cloud server, so that the cloud server sends the first output to the second vehicle or the terminal device based on the license plate information of the second vehicle.
[0317] Based on the above technical solution, the vehicle can send the first output and the license plate information of the second vehicle to the cloud server, thereby enabling the cloud server to send the first output to the second vehicle using the license plate information of the second vehicle. By forwarding information through the cloud server, the information isolation between the drivers of the two vehicles can be broken down, enabling interaction between vehicles.
[0318] Optionally, the cloud server stores the association between the vehicle's license plate and the identification information of the terminal device (e.g., a mobile phone or vehicle).
[0319] In this embodiment of the application, the cloud server can store the correspondence between the identity document (ID) or address information of each account and the license plate information of one or more devices corresponding to each account.
[0320] For example, Table 1 shows the correspondence between the address information and license plate information of each account, one or more devices corresponding to each account.
[0321] Table 1
[0322] For example, a cloud server can receive vehicle license plate information sent by multiple terminal devices. For instance, user 1 can enter the license plate number xxx associated with vehicle B while logged into account 1 on vehicle B, thereby enabling vehicle B to send the binding relationship between account 1 and license plate number xxx to the cloud server.
[0323] For example, user 2 can enter the license plate number yyy associated with vehicle C on mobile phone C, which is logged into account 2, so that mobile phone C can send the binding relationship between account 2 and license plate number yyy to the cloud server.
[0324] For example, the output of the aforementioned multimodal large model may include information about the license plate number xxx. After receiving the first output sent by the first vehicle and the license plate number xxx, the cloud server can determine to send the first output to vehicle B according to the association relationship shown in Table 1 above.
[0325] For example, the output of the aforementioned multimodal large model may include information about the license plate number yyy. After receiving the first output from the first vehicle and the license plate number yyy, the cloud server can determine to send the first output to mobile phone C based on the association shown in Table 1 above. After receiving the first output, mobile phone C can determine whether it is located inside the cabin of vehicle C. If mobile phone C is located inside the cabin of vehicle C, then mobile phone C can provide a prompt to the user based on the first output.
[0326] Optionally, the first vehicle sends a first output to the second vehicle or a terminal device, including: the first vehicle sending the first output to the second vehicle via near-field communication technology.
[0327] For example, taking the scenario shown in Figure 3 as an example, when the environmental information indicates that there is a vehicle 200 around the vehicle 100 and the current lighting status of the vehicle 200 indicates that the vehicle 200 has turned on its high beams, the vehicle 100 can send the first output to the vehicle 200 through the star flash technology.
[0328] S1440, the second vehicle determines the second output based on the first output and environmental information.
[0329] For example, taking the scenario shown in Figure 3 as an example, the first inference model can output the user's true intention: "Please switch the high beams to low beams." Vehicle 200 can input the user's true intention in vehicle 100 and the image data collected by the camera outside the cabin of vehicle 200 into the fourth inference model, thereby obtaining the text content: "Dear driver, you are currently using high beams, which may affect the driving of oncoming drivers. The oncoming drivers would like you to switch to low beams."
[0330] For example, taking the scenario shown in Figure 4 as an example, the first inference model can output the user's true intention, "Please speed up in the fast lane." Vehicle 200 can input the user's true intention in vehicle 100 and the image data collected by the camera outside the cabin into the fourth inference model, thereby obtaining the text content, "Dear driver, Xiao A reminds you that you are currently in the fast lane, where the speed limit is 80-100 km / h. The vehicle is currently traveling at 65 km / h. The drivers behind you are in a hurry, please speed up."
[0331] For example, taking the scenario shown in Figure 7, the first vehicle can be vehicle 100 and the second vehicle can be vehicle 200. The first input can be the horn input from the driver inside vehicle 100. Vehicle 100 can input the horn input and the data collected by sensors outside the vehicle 100's cabin into the first inference model, thereby obtaining the user's intention "Please do not cut in line". Since the sensors outside vehicle 100's cabin do not detect the cone in front of vehicle 200, vehicle 100 will mistakenly believe that the user's intention complies with traffic regulations, and can thus send the user's intention to vehicle 200. After receiving the user's intention, vehicle 200 can input the user's intention and the data collected by sensors outside its cabin (including data related to road construction) into the fourth inference model, thereby obtaining the text content "The driver who honked may not know about the road construction, I will relay it to him."
[0332] Optionally, before the second vehicle determines the second output based on the first output and environmental information, the method 1400 further includes: the second vehicle control prompting device prompting the user that it has received the first output from the first vehicle and prompting the user whether to prompt the user; wherein, the second vehicle determining the second output based on the first output and environmental information includes: in response to detecting an input from the user indicating that they want to prompt the user, the second vehicle determines the second output based on the first output and environmental information.
[0333] Optionally, the second vehicle determines the second output based on the first output and environmental information, including: the second vehicle determines the second output based on data collected by sensors in the vehicle cabin, the first output, and environmental information.
[0334] For example, the second vehicle can output the first output and environmental information into the fourth inference model, thereby obtaining the second output.
[0335] For example, the fourth inference model can be a multimodal large model. This multimodal large model can be trained from a general multimodal large model. For instance, this multimodal large model includes a knowledge base. The knowledge base includes, but is not limited to, road regulations of one or more countries (or regions), speed limit information for road sections, and reasonable handling methods during vehicle interactions. This allows a general multimodal large model to become a dedicated multimodal large model for the vehicle interaction domain.
[0336] For example, taking the scenario shown in Figure 3, the first vehicle can be vehicle 100 and the second vehicle can be vehicle 200. The first inference model can output the user's true intention: "Please switch the high beams to low beams." Vehicle 200 can input the user's true intention in vehicle 100, the data collected by sensors in the cabin (indicating that the driver in vehicle 200 is in a tense state), and the image data collected by the camera outside the cabin into the fourth inference model, thereby obtaining the text content: "Dear driver, you are currently using high beams, which may affect oncoming drivers. The oncoming driver would like you to switch to low beams. You can move the lever behind the right side of the steering wheel upwards to switch to low beams."
[0337] For example, taking the scenario shown in Figure 4, the first vehicle can be vehicle 100 and the second vehicle can be vehicle 200. The first inference model can output the user's true intention, "Please accelerate in the fast lane." Vehicle 200 can input the user's true intention in vehicle 100, the data collected by sensors in the cabin (indicating that the driver in vehicle 200 is in a pleasant state), and the image data collected by the camera outside the cabin into the fourth inference model, thereby obtaining the text content, "Dear driver, Xiao A reminds you that the driver behind you is a bit impatient, please accelerate!"
[0338] Based on the above technical solution, data collected by sensors within the vehicle cabin can be incorporated when determining the second output. This makes the second output more acceptable to the current users in the cabin, thus improving their driving and riding experience.
[0339] Optionally, the second vehicle determines the second output based on the first output and environmental information, including: the second vehicle determines the second output based on the first information of the driver in the second vehicle's cabin, the first output and environmental information, wherein the first information includes one or more of driving proficiency, driving habits or physiological characteristics.
[0340] For example, taking the scenario shown in Figure 4, the first vehicle can be vehicle 100 and the second vehicle can be vehicle 200. The first inference model can output the user's true intention, "Please speed up in the fast lane." Vehicle 200 can input the user's true intention in vehicle 100 and the driver's driving skill level in vehicle 200 (indicating that the driver in the second vehicle is a novice driver) into the fourth inference model, thereby obtaining the text content, "Dear driver, Xiao A reminds you that you are currently in the fast lane, where the speed limit is 80-100 km / h. The vehicle is currently traveling at 65 km / h. The drivers behind you are a bit anxious, please speed up."
[0341] For example, consider the scenario shown in Figure 4. Vehicle 200 can input the user's true intention in vehicle 100 and the driver's driving skill level in vehicle 200 (indicating that the driver in the second vehicle is a highly skilled driver) into the fourth inference model, thereby obtaining the text content "Dear driver, the driver behind you is in a hurry, please speed up".
[0342] For example, take the scenario shown in Figure 4. Vehicle 200 can input the user's true intention in vehicle 100 and the driver's driving habits in vehicle 200 (the frequency of driving in the slow lane in the past period of time is greater than or equal to a preset frequency) into the fourth inference model, so as to obtain the text content "Dear car owner, the driver behind is a bit anxious, you can switch to the slow lane".
[0343] For example, consider the scenario shown in Figure 4. Vehicle 200 can input the user's true intention in vehicle 100 and the physiological characteristics of the driver in vehicle 200 (indicating the driver is an elderly driver) into the fourth inference model, thereby obtaining the text content: "Dear driver, Xiao A reminds you that you are currently in the fast lane, where the speed limit is 80-100 km / h. The vehicle is currently traveling at 65 km / h. The drivers behind you are in a bit of a hurry. If you are not in a hurry, you can turn on your right turn signal and switch to the slow lane."
[0344] Based on the above technical solution, driver information can be incorporated when determining the second output. This allows the second output to better match the driver's profile, resulting in different outputs for different driver profiles. This helps improve the vehicle's intelligence and enhances the user's driving experience.
[0345] Optionally, the second vehicle determines a second output based on the first output and environmental information, including: the second vehicle determines the second output based on the historical driving records of other vehicles around the second vehicle, the first output, and environmental information.
[0346] For example, taking the scenario shown in Figure 7, the first output could be the user's intention, "Please do not cut in line." Since the sensors outside vehicle 100's cabin do not detect the traffic cone in front of vehicle 200, they might mistakenly interpret the user's intention as complying with traffic regulations, and thus send the user's intention to vehicle 200. Upon receiving the user's intention, vehicle 200 can input the user's intention, the historical driving records of other vehicles around vehicle 200 (indicating that other vehicles have alternately passed through the road segment over a past period), and data collected by the sensors outside the cabin (including data related to road construction) into the fourth inference model. This will result in the text content, "The driver who honked may not know about the road construction; I will relay this message to him."
[0347] Based on the above technical solution, the historical driving records of other vehicles around the vehicle can be considered when determining the second output. This makes the second output more accurate and more acceptable to users in the cabin.
[0348] Optionally, the second vehicle determines the second output based on the first output and environmental information, including: the second vehicle inputs the first output and environmental information into the inference model to obtain the second output.
[0349] S1450, Second vehicle control prompt device outputs second output.
[0350] For example, after obtaining the text content of the fourth inference model, the second vehicle can control the sound-emitting device in the second vehicle's cabin to play the corresponding voice information.
[0351] Figure 15 shows a schematic diagram of the architecture of the system 1500 provided in an embodiment of this application. The system 1500 includes a vehicle 1510 and a vehicle 1520. The vehicle 1510 may include a first inference model, and the vehicle 1520 may include a fourth inference model.
[0352] Vehicle 1510 can be vehicle 100, and vehicle 1520 can be vehicle 200. Alternatively, vehicle 1510 can be the first vehicle, and vehicle 1520 can be the second vehicle.
[0353] Figure 16 shows a schematic block diagram of an interactive device 1600 provided in an embodiment of this application. The device 1600 includes: an acquisition unit 1610, configured to acquire a first input from a user in a vehicle cabin and environmental information around the vehicle, the environmental information including information about another vehicle; a determination unit 1620, configured to determine a first output based on the first input and the environmental information; and a sending unit 1630, configured to send the first output to another vehicle or a terminal device, the terminal device being associated with the other vehicle.
[0354] Optionally, the first input is associated with the user's negative emotions, and the first output includes the de-emotionalized output.
[0355] Optionally, the determining unit 1620 is used to: input the first input and environmental information into the first inference model to obtain the first output.
[0356] Optionally, the acquisition unit 1610 is also used to acquire vehicle information; the determination unit is used to: determine a first output based on the first input, the vehicle information and the environmental information.
[0357] Optionally, the determining unit 1620 is used to: input the first input, vehicle information and environmental information into the second inference model to obtain the first output.
[0358] Optionally, the determining unit 1620 is also configured to: determine information about another vehicle based on the first input and environmental information.
[0359] Optionally, the determining unit 1620 is used to: input the first input and environmental information into the third inference model to obtain information about another vehicle.
[0360] Optionally, the information of the other vehicle includes the license plate information of the other vehicle; wherein, the sending unit 1630 is used to: send the first output and the license plate information of the other vehicle to the cloud server, so that the cloud server sends the first output to the other vehicle or terminal device based on the license plate information of the other vehicle.
[0361] Optionally, the device further includes a control unit and a determining unit, which are further configured to determine a first control command based on the first input and environmental information, wherein the first control command is associated with one or more actuators; and the control unit is configured to control one or more actuators to execute the first control command.
[0362] Figure 17 shows a schematic block diagram of an interactive device 1700 provided in an embodiment of this application. The device 1700 includes: an acquisition unit 1710 for acquiring a first output of another vehicle and environmental information around the vehicle; a determination unit 1720 for determining a second output based on the first output and the environmental information; and a control unit 1730 for controlling a prompting device to output the second output.
[0363] Optionally, the determining unit 1720 is used to: determine a second output based on data collected by sensors in the vehicle cabin, a first output, and environmental information.
[0364] Optionally, the determining unit 1720 is used to: determine a second output based on the first information of the driver in the vehicle, the first output, and environmental information, wherein the first information includes one or more of driving proficiency, driving habits, or physiological characteristics.
[0365] Optionally, the determining unit 1720 is used to: determine a second output based on the historical driving records of other vehicles around the vehicle, the first output, and environmental information.
[0366] Optionally, the determining unit 1720 is used to: input the first output and environmental information into the inference model to obtain the second output.
[0367] It should be understood that the division of units in the above device is only a logical functional division. In actual implementation, they can be fully or partially integrated into a single physical entity, or they can be physically separated. Furthermore, the units in the device can be implemented by a processor calling software; for example, the device includes a processor connected to memory, which stores instructions. The processor calls the instructions stored in memory to implement any of the above methods or to implement the functions of each unit in the device. The processor can be, for example, a general-purpose processor, such as a CPU or microprocessor, and the memory can be internal or external to the device. Alternatively, the units in the device can be implemented as hardware circuits. The functions of some or all units can be implemented through the design of the hardware circuits, which can be understood as one or more processors. For example, in one implementation, the hardware circuit is an ASIC, and the functions of some or all units are implemented through the design of the logical relationships between the components within the circuit. In another implementation, the hardware circuit can be implemented using a PLD, such as an FPGA, which can include a large number of logic gates. The connection relationships between the logic gates are configured through configuration files, thereby implementing the functions of some or all units. All units of the above devices can be implemented entirely through processor calling software, or entirely through hardware circuits, or partially through processor calling software with the remaining parts implemented through hardware circuits.
[0368] In this application embodiment, a processor is a circuit with signal processing capabilities. In one implementation, the processor can be a circuit with instruction reading and execution capabilities, such as a CPU, microprocessor, GPU, or DSP. In another implementation, the processor can implement certain functions through the logical relationships of hardware circuits. These logical relationships are fixed or reconfigurable. For example, the processor may be a hardware circuit implemented as an ASIC or PLD, such as an FPGA. In a reconfigurable hardware circuit, the process of the processor loading a configuration document and configuring the hardware circuit can be understood as the processor loading instructions to implement the functions of some or all of the above units. Furthermore, it can also be a hardware circuit designed for artificial intelligence, which can be understood as an ASIC, such as an NPU, TPU, or DPU.
[0369] As can be seen, each unit in the above device can be one or more processors (or processing circuits) configured to implement the above methods, such as: CPU, GPU, NPU, TPU, DPU, microprocessor, DSP, ASIC, FPGA, or a combination of at least two of these processor forms.
[0370] Furthermore, the units in the above devices can be integrated in whole or in part, or they can be implemented independently. In one implementation, these units are integrated together as a System-on-a-Chip (SoC). The SoC may include at least one processor for implementing any of the above methods or implementing the functions of the units in the device. The at least one processor may be of different types, such as CPU and FPGA, CPU and AI processor, CPU and GPU, etc.
[0371] This application also provides an interactive device, which includes a processing unit and a storage unit. The storage unit is used to store instructions, and the processing unit executes the instructions stored in the storage unit to cause the device to perform the methods or steps described in the above embodiments.
[0372] Optionally, if the interactive device is located in a vehicle, the aforementioned processing unit may be one or more of the processors 121-12n shown in FIG1.
[0373] This application also provides an interactive system, which includes the above-described interactive device and sensing system.
[0374] This application also provides a vehicle that may include the aforementioned interactive device or interactive system.
[0375] This application also provides a computer program product, which includes computer program code that, when run on a computer, causes the computer to perform the methods described in the above embodiments.
[0376] This application also provides a computer-readable medium storing program code that, when run on a computer, causes the computer to perform the methods described in the above embodiments.
[0377] This application also provides a chip, which includes a circuit for performing the methods described in the above embodiments.
[0378] In implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software. The method disclosed in the embodiments of this application can be directly implemented by a hardware processor, or by a combination of hardware and software modules within the processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, power-on erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, detailed descriptions are omitted here.
[0379] It should be understood that in the embodiments of this application, the memory may include read-only memory and random access memory, and provides instructions and data to the processor.
[0380] It should also be understood that, in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0381] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0382] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0383] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0384] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0385] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0386] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0387] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be covered. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. An interaction method, characterized in that, include: Acquire the first input from the user inside the vehicle cabin and the environmental information around the vehicle, the environmental information including information about another vehicle; Based on the first input and the environmental information, determine the first output; The first output is sent to the other vehicle or terminal device, the terminal device being associated with the other vehicle.
2. The method according to claim 1, characterized in that, The first input is associated with the user's negative emotions, and the first output includes the de-emotionalized output.
3. The method according to claim 1 or 2, characterized in that, Determining the first output based on the first input and the environmental information includes: The first input and the environmental information are input into the first inference model to obtain the first output.
4. The method according to any one of claims 1 to 3, characterized in that, The method further includes: Obtain information about the vehicle; The step of determining the first output based on the first input and the environmental information includes: The first output is determined based on the first input, the vehicle information, and the environmental information.
5. The method according to claim 4, characterized in that, Determining the first output based on the first input, the vehicle information, and the environmental information includes: The first input, the vehicle information, and the environmental information are input into the second inference model to obtain the first output.
6. The method according to any one of claims 1 to 5, characterized in that, The method further includes: Based on the first input and the environmental information, the information of the other vehicle is determined.
7. The method according to claim 6, characterized in that, Determining the information of the other vehicle based on the first input and the environmental information includes: The first input and the environmental information are input into the third inference model to obtain the information of the other vehicle.
8. The method according to claim 6 or 7, characterized in that, The information of the other vehicle includes the license plate information of the other vehicle; The step of sending the first output to the other vehicle or terminal device includes: The cloud server sends the first output and the license plate information of the other vehicle to the cloud server, so that the cloud server can send the first output to the other vehicle or the terminal device based on the license plate information of the other vehicle.
9. The method according to any one of claims 1 to 8, characterized in that, The method further includes: Based on the first input and the environmental information, a first control instruction is determined, wherein the first control instruction is associated with one or more actuators; Control one or more actuators to execute the first control instruction.
10. An interaction method, characterized in that, include: Acquire the first output of another vehicle and environmental information around that vehicle; Based on the first output and the environmental information, determine the second output; The control prompting device outputs the second output.
11. The method according to claim 10, characterized in that, Determining the second output based on the first output and the environmental information includes: The second output is determined based on data collected by sensors in the vehicle cabin, the first output, and the environmental information.
12. The method according to claim 10 or 11, characterized in that, Determining the second output based on the first output and the environmental information includes: The second output is determined based on the first information of the driver in the vehicle, the first output, and the environmental information. The first information includes one or more of the following: driving proficiency, driving habits, or physiological characteristics.
13. The method according to any one of claims 10 to 12, characterized in that, Determining the second output based on the first output and the environmental information includes: The second output is determined based on the historical driving records of other vehicles around the vehicle, the first output, and the environmental information.
14. The method according to any one of claims 10 to 13, characterized in that, Determining the second output based on the first output and the environmental information includes: The first output and the environmental information are input into the inference model to obtain the second output.
15. An interactive device, characterized in that, include: The acquisition unit is used to acquire the first input from the user in the vehicle cabin and the environmental information around the vehicle, the environmental information including information about another vehicle; A determining unit is configured to determine a first output based on the first input and the environmental information; A sending unit is configured to send the first output to the other vehicle or a terminal device associated with the other vehicle.
16. The apparatus according to claim 15, characterized in that, The first input is associated with the user's negative emotions, and the first output includes the de-emotionalized output.
17. The apparatus according to claim 15 or 16, characterized in that, The determining unit is used for: The first input and the environmental information are input into the first inference model to obtain the first output.
18. The apparatus according to any one of claims 15 to 17, characterized in that, The acquisition unit is also used to acquire information about the vehicle; The determining unit is configured to: determine the first output based on the first input, the vehicle information, and the environmental information.
19. The apparatus according to claim 18, characterized in that, The determining unit is used for: The first input, the vehicle information, and the environmental information are input into the second inference model to obtain the first output.
20. The apparatus according to any one of claims 15 to 19, characterized in that, The determining unit is further configured to: determine the information of the other vehicle based on the first input and the environmental information.
21. The apparatus according to claim 20, characterized in that, The determining unit is used for: The first input and the environmental information are input into the third inference model to obtain the information of the other vehicle.
22. The apparatus according to claim 20 or 21, characterized in that, The information of the other vehicle includes the license plate information of the other vehicle; The sending unit is configured to: send the first output and the license plate information of the other vehicle to the cloud server, so that the cloud server sends the first output to the other vehicle or the terminal device based on the license plate information of the other vehicle.
23. The apparatus according to any one of claims 15 to 22, characterized in that, The device also includes a control unit. The determining unit is further configured to determine a first control instruction based on the first input and the environmental information, wherein the first control instruction is associated with one or more actuators; The control unit is used to control the one or more actuators to execute the first control command.
24. An interactive device, characterized in that, include: The acquisition unit is used to acquire the first output of another vehicle and environmental information around the vehicle. A determining unit is configured to determine a second output based on the first output and the environmental information; The control unit is used to control the prompting device to output the second output.
25. The apparatus according to claim 24, characterized in that, The determining unit is used for: The second output is determined based on data collected by sensors in the vehicle cabin, the first output, and the environmental information.
26. The apparatus according to claim 24 or 25, characterized in that, The determining unit is used for: The second output is determined based on the first information of the driver in the vehicle, the first output, and the environmental information. The first information includes one or more of the following: driving proficiency, driving habits, or physiological characteristics.
27. The apparatus according to any one of claims 24 to 26, characterized in that, The determining unit is used for: The second output is determined based on the historical driving records of other vehicles around the vehicle, the first output, and the environmental information.
28. The apparatus according to any one of claims 24 to 27, characterized in that, The determining unit is used for: The first output and the environmental information are input into the inference model to obtain the second output.
29. An interactive device, characterized in that, include: Memory, used to store computer programs; A processor for executing a computer program stored in the memory to cause the apparatus to perform the method as described in any one of claims 1 to 14.
30. An interactive system, characterized in that, The interactive system includes a computing platform and a sensing system, and the computing platform includes the interactive device as described in claim 29.
31. A vehicle, characterized in that, It includes the interactive device as described in any one of claims 15-29, or the interactive system as described in claim 30.
32. A computer-readable storage medium, characterized in that, It stores instructions that, when executed by a processor, cause the processor to implement the method as described in any one of claims 1 to 14.
33. A computer program product, characterized in that, The computer program product includes computer program code that, when run on a computer, causes the computer to perform the method as described in any one of claims 1 to 14.
34. A chip, characterized in that, The chip includes circuitry for performing the method as described in any one of claims 1 to 14.