A vehicle control method, device, readable medium, product, and vehicle

By recognizing user gestures and voice commands in the target audio region within the vehicle, and combining computer vision and speech recognition technologies, the problem of voice control when users are unfamiliar with device names is solved, resulting in a more user-friendly vehicle control experience.

CN122244901APending Publication Date: 2026-06-19BEIJING CHJ AUTOMOTIVE TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING CHJ AUTOMOTIVE TECH CO LTD
Filing Date
2022-03-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

When users are unfamiliar with the names of vehicle equipment, they cannot effectively use voice control, resulting in an unfriendly interaction.

Method used

By obtaining the gesture information and voice commands of the target user in the target audio region, and combining computer vision and speech recognition technologies, the system can identify the direction of the user's gestures and control the target device, thus avoiding the need for the user to explicitly state the device name.

🎯Benefits of technology

It enables precise control through natural gestures and voice commands without needing to know the device name, improving user experience and interaction efficiency while reducing the rate of misoperation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244901A_ABST
    Figure CN122244901A_ABST
Patent Text Reader

Abstract

This disclosure relates to a vehicle control method, device, readable medium, product, and vehicle, and pertains to the field of intelligent control. It can be applied to computing devices. The method includes: obtaining gesture information of a target user in a target voice region and combining it with a first voice command in the target voice region, so that the user does not need to know the specific name of the device in advance, but only needs to use natural gestures in conjunction with functional voice (such as "open this") to achieve control, thereby fundamentally solving the problem that users who are not familiar with the device name cannot perform effective voice interaction.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application is a divisional application of patent application No. 202210292172.5, filed on March 23, 2022, entitled "A voice control method, device, equipment and storage medium for a device". Technical Field

[0002] This disclosure relates to the field of intelligent control, and more particularly to a vehicle control method, device, readable medium, product, and vehicle. Background Technology

[0003] With the continuous development of the automotive industry, in-vehicle infotainment systems are constantly being updated and upgraded, and vehicle control functions are being enhanced. Many in-vehicle infotainment systems are equipped with voice assistants, allowing users to control the vehicle via voice.

[0004] Currently, users need to explicitly state the device they want to control and the function they want to achieve. For example, a user needs to say "turn on the left reading light" for the voice assistant to turn it on. This is clearly not user-friendly for those unfamiliar with device names; when a user doesn't know the name of the controllable device, they cannot perform the operation. Summary of the Invention

[0005] To address the aforementioned technical problems, this disclosure provides a vehicle control method, apparatus, device, readable medium, product, and vehicle.

[0006] In a first aspect, this disclosure provides a vehicle control method, including: By obtaining the gesture information of the target user in the target voice region and combining it with the first voice command in the target voice region, the user can control the device without knowing its specific name in advance, simply by using natural gestures and functional voice commands (such as "open this"), thus fundamentally solving the problem that users who are not familiar with the device name cannot conduct effective voice interaction.

[0007] Secondly, this disclosure provides a vehicle control device, comprising: The first acquisition unit is used to acquire the gesture information and the first voice command of the target user corresponding to the target voice region to be awakened; The first control unit is used to control the target device based on the gesture information and the first voice command.

[0008] Thirdly, this disclosure provides a computing device, including: Memory; Processor; and Computer programs; The computer program is stored in the memory and configured to be executed by the processor to implement the method described in the first aspect.

[0009] Fourthly, this disclosure provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in the first aspect.

[0010] Fifthly, a computer program product comprising a computer program or instructions that, when executed by a processor, implement the method as described in the first aspect.

[0011] A sixth aspect is a vehicle, characterized in that it includes the computing device described in the second aspect.

[0012] The vehicle control method, apparatus, device, and readable storage medium disclosed herein obtain the gesture information and first voice command of the target user corresponding to the wake-up target voice zone, and control the target device according to the gesture information and the first voice command. This allows the user to control the device they want to control through voice commands without saying the device name, realizing a combination of voice and gesture, which provides convenience for the user's voice control and solves the problem that the user cannot control the device by voice if they do not know the device name. Attached Figure Description

[0013] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.

[0014] To more clearly illustrate the technical solutions in the embodiments of this disclosure or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0015] Figure 1 A flowchart of a vehicle control method provided in this disclosure embodiment; Figure 2 A schematic diagram illustrating an application scenario provided by an embodiment of this disclosure; Figure 3 This is a schematic diagram of the structure of a vehicle control device provided in an embodiment of the present disclosure; Figure 4 This is a schematic diagram of the structure of a computing device provided in an embodiment of this disclosure. Detailed Implementation

[0016] To better understand the above-mentioned objectives, features, and advantages of this disclosure, the solutions disclosed herein will be further described below. It should be noted that, unless otherwise specified, the embodiments and features described herein can be combined with each other.

[0017] Numerous specific details are set forth in the following description in order to provide a full understanding of this disclosure, but this disclosure may also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only some, and not all, of the embodiments of this disclosure.

[0018] Currently, the vehicle can collect user commands from different locations using multiple voice zones. Each user can activate the voice assistant based on their current voice zone and issue control commands, allowing the voice assistant to assist in controlling the vehicle's controllable devices. However, during use, users need to explicitly state the name of the device they want to control and the function they wish to achieve. This is inconvenient for users unfamiliar with the device names. Therefore, a method is needed that allows users to control the device via voice without stating its name.

[0019] This disclosure provides a vehicle control method, which will be described below with reference to specific embodiments.

[0020] Figure 1 This is a flowchart of a vehicle control method provided in an embodiment of the present disclosure. The method can be executed by a vehicle control device, which can be implemented in software and / or hardware. The vehicle control device can be configured in a computing device, such as a server or terminal, specifically in the vehicle's infotainment system or a cloud server.

[0021] The following is combined Figure 2 The application scenarios shown are for Figure 1 The voice-assisted wake-up method shown will be introduced, for example... Figure 2 The vehicle's infotainment system 201 can execute this method. The infotainment system 201 is equipped with a voice assistant, and this example illustrates how a user in voice zone 204 can control the controllable device 203. The specific steps of this method are as follows: S101. Obtain the gesture information and first voice command of the target user corresponding to the target voice region to be awakened.

[0022] The vehicle system 201 can obtain the gesture information and first voice command of the target user corresponding to the target sound zone. The gesture information refers to the hand movement information of the user who intends to control which device corresponding to the target sound zone. The following is an example of recognizing the user's gesture direction from the image information of the target sound zone.

[0023] For example, the vehicle infotainment system 201 can first determine the target sound zone for wake-up. This refers to the voice wake-up command issued by the user in a specific area within the vehicle cabin (e.g., the left seat in the second row). In response to this wake-up command, the vehicle infotainment system 201 can use technologies such as sound source localization to determine the physical sound source area from which the wake-up command was issued, thus identifying it as the target sound zone. This determination of the target sound zone ensures that subsequent information collection and interactive actions are directed towards the user who issued the wake-up command within the target sound zone (in this embodiment, this can be referred to as the target user).

[0024] After the target sound zone is established, the vehicle system 201 can call the camera corresponding to the target sound zone to obtain image information within the field of view of the target sound zone, and can activate the voice acquisition device (such as a directional microphone array) corresponding to the target sound zone to start receiving the specific operation command issued by the user immediately after the wake-up command, i.e., the first voice command.

[0025] For example, in combination Figure 2 In the illustrated application scenario, when a user in the target voice zone 204 utters a wake-up phrase such as "Hello, voice assistant," the vehicle system 201 confirms that this voice zone is the target voice zone for this interaction. Subsequently, the user's intent can be executed. The vehicle system 201 can perform two data acquisition tasks: First, it can obtain image information, including the target user's posture, especially hand movements, through a dedicated camera 202 that captures image information of the target voice zone 204. Then, it can obtain the user's first voice command, such as "turn on the lights" or "open the window," through the voice acquisition device 2041 (such as a microphone) in the target voice zone 204. At this point, the vehicle system 201 has completed the acquisition of both image information and voice commands for the target voice zone, providing the necessary and clear input for subsequent steps involving analyzing the user's gesture direction from the image information and combining it with the first voice command to control the ultimately determined target device.

[0026] S102. Control the target device according to the gesture information and the first voice command.

[0027] The vehicle-mounted system 201 can control the target device based on gesture information and the first voice command.

[0028] For example, after the vehicle system 201 acquires the image information of the target audio zone 204, the vehicle system 201 can recognize the gestures in the image information.

[0029] For example, the vehicle system 201 can analyze image data using computer vision (CV) recognition technology to identify the direction of the user's gesture, that is, determine the direction of the user's gesture.

[0030] After determining the direction of a gesture using CV recognition technology, the vehicle infotainment system 201 can determine the pointing path of the gesture based on the gesture direction, thereby identifying which devices in the vehicle cabin are on the pointing path. For example, if device 203 (which can be called the target device in this embodiment) is on the path pointed by the user's gesture, the vehicle infotainment system 201 can identify device 203 as the device that the user wants to control.

[0031] It should be noted that there may be multiple devices along the path pointed to by the user's gesture. The vehicle system 201 can identify these multiple devices as candidate devices and then select the target device from among the candidate devices.

[0032] After the vehicle infotainment system 201 determines the device 203 that the user wants to control through the user's gesture, the voice assistant in the vehicle infotainment system 201 can control the device 203 according to the voice information collected by the voice acquisition device 2041 corresponding to the voice zone 204 (which can be called the first voice command in this embodiment). The voice information can be information about the function that the user wants to achieve. For example, if the user wants to turn on the reading light but does not know the specific name of the reading light, he / she can say "turn on the light" and point to the reading light, so that the voice assistant can recognize it and turn on the reading light.

[0033] This embodiment of the disclosure constructs a system by strongly associating the target voice region with the gesture information of the target user. It first determines the specific area (i.e., the target voice region) from which the wake-up command is issued, and then precisely locates the target device corresponding to the user's gesture information in that target voice region. This not only ensures the accuracy of the source of the interaction command and avoids false wake-ups and misoperations in multi-occupant scenarios in a vehicle, but also achieves accurate mapping from vague voice intentions to specific physical devices. It provides a more accurate, reliable, and natural in-vehicle human-machine interaction experience than simple voice control or gesture interaction not strongly associated with a voice region. Thus, by obtaining the gesture information of the target user in the target voice region and combining it with the first voice command in the target voice region, the user does not need to know the specific name of the device beforehand; they can control it simply by using hand gestures in conjunction with functional voice commands (such as "open this"), thereby fundamentally solving the problem that users unfamiliar with device names cannot perform effective voice interaction.

[0034] Based on the above embodiments, the vehicle control method may further include: If there are multiple devices along the path indicated by the gesture, that is, at least one device is multiple candidate devices, then information on multiple candidate devices can be displayed, and a prompt can be made to select the controlled device.

[0035] When determining the device on the path pointed to by the gesture direction, there may be multiple devices on the path. In this case, the vehicle system 201 cannot determine the device that the user wants to control based solely on the gesture direction. Therefore, when there are multiple devices on the path pointed to by the gesture direction, such as reading lights and sunshades, the vehicle system 201 can display information about the multiple devices on the path pointed to by the gesture direction and prompt the user to select the device they want to control.

[0036] In some possible implementations, if there are multiple devices along the path pointed to by the gesture, i.e., there are multiple candidate devices, the vehicle system 201 cannot determine the device that the user wants to control based solely on the gesture direction.

[0037] The vehicle system 201 can output information about multiple candidate devices via voice, then obtain voice selection instructions for the multiple candidate devices, and determine the target device from the multiple candidate devices according to the voice selection instructions.

[0038] For example, candidate devices include reading lights and sunshades. The vehicle system 201 can announce the names of multiple devices through the voice acquisition device 2041. For instance, the voice assistant can announce "Do you want to control the reading lights or the sunshades?" through the microphone corresponding to the voice zone 204. For example, if the user outputs the voice selection command "sunshades", the vehicle system 201 can determine the sunshades as the target device from multiple candidate devices based on the voice selection command.

[0039] In this way, when multiple candidate devices appear along the path of a user's gesture, leading to ambiguity in intent, the system guides the user to make a clear verbal selection by broadcasting the candidate device information and receiving voice selection instructions. The user only needs to use the most intuitive voice response (such as saying "sunshade") to complete the final selection of the target device, thereby improving the smoothness of human-computer interaction in scenarios with multiple candidate devices while ensuring control accuracy.

[0040] In some possible implementations, when there are multiple candidate devices, the vehicle system 201 can display information of multiple candidate devices on the touch screen of the target audio zone, obtain selection instructions for the displayed information, and then determine the target device from the multiple candidate devices according to the touch selection instructions.

[0041] For example, the vehicle system 201 can also display information of multiple candidate devices on the touch screen 2042 corresponding to the target audio zone (204 in this embodiment), and prompt the user to select which device among the multiple candidate devices they want to control. For example, the icons of reading lights and sunshades can be displayed on the touch screen for the user to click and select.

[0042] For example, if a user clicks the control corresponding to the reading light icon, the vehicle system 201 can determine the reading light as the target device from multiple candidate devices based on the touch selection command.

[0043] By visually displaying candidate device information (such as icons) on a touchscreen associated with the target audio region, a clear and direct visual selection interface is provided to the user. Users can issue clear selection commands through simple touch operations (such as clicks), which greatly improves interaction efficiency, especially suitable for scenarios where voice commands may be unclear or environmental noise may interfere with the in-vehicle environment. This approach not only avoids auditory confusion or memory burden that may result from voice-guided device list announcements, but also achieves precise WYSIWYG control, further reducing the error rate and enriching reliable interaction methods in scenarios with multiple candidate devices, thereby significantly enhancing the user experience.

[0044] In some possible implementations, when there are multiple candidate devices, the vehicle system 201 can display information about multiple candidate devices on the touch screen of the target audio zone. Users can also issue voice selection commands, such as saying "turn on the reading light". After the vehicle system 201 displays the device information, the user can know the name of the device they want to control and issue a selection command by saying the name of the device, so that the vehicle system 201 can identify the target controlled device as the reading light.

[0045] In this embodiment of the disclosure, by broadcasting or displaying information about multiple devices on the path pointed to by the gesture direction and prompting the user to select the controlled device, the situation where it is impossible to determine the device the user wants to control when there are multiple devices on the path pointed to by the gesture direction can be avoided, thereby improving the convenience of the user in using the voice assistant and making the vehicle control method more complete.

[0046] Based on the above embodiments, before acquiring the image information of the target sound region to be awakened, the vehicle control method may further include: The controlled device is determined based on the voice information; if the controlled device cannot be determined, the step of obtaining the image information of the target audio region and subsequent steps S102 are executed.

[0047] For example, in some scenarios, a user can speak the name of the device they want to control, or the vehicle system 201 can determine the device the user wants to control based on the device name portion contained in the user's voice message. Before acquiring the image information of the target audio region 204, the vehicle system 201 determines the controlled device based on the user's voice message and then executes subsequent steps to control the controlled device based on the voice message. If the controlled device cannot be determined based on the voice message, the system will execute the step of acquiring the image information of the target audio region and subsequent steps.

[0048] In this embodiment of the disclosure, by determining whether the vehicle system can identify the controlled device based on voice information, it is determined whether the controlled device needs to be identified based on the user's gestures. This avoids unnecessary image analysis and improves the efficiency and flexibility of the method.

[0049] In some possible implementations, after controlling the target device using the first voice command, voice information can be further collected through the voice acquisition device corresponding to the target voice region (which can be referred to as the second voice command), and the device can be controlled based on the collected voice information.

[0050] That is, the vehicle infotainment system 201 can obtain the second voice command corresponding to the target audio region, and control the target device according to the second voice command. The second voice command includes at least one of the following types: Includes control instructions for pronouns used to refer to the target device; Control commands including the full or partial name of the target device; It includes only control instructions for controlling actions.

[0051] For example, after the vehicle system 201 determines the device that the user wants to control, the user may continue to issue other control commands. At this time, the vehicle system 201 can continue to collect voice information through the voice acquisition device 2041 corresponding to the target audio zone, and control the device 203 according to the collected voice information. For example, the user may later issue control commands including pronouns for referring to the target device (e.g., "What is this?", "Show its content to another screen", "Make it brighter", "Put it away") or control commands including the full or part of the name of the target device (e.g., "Turn up the audio volume") or control commands that only include control actions ("Turn up the volume", "Make it brighter") to facilitate the user's use of the target device.

[0052] In this embodiment, after controlling the target device using gesture information from the target audio region and a first voice command, voice information from the target audio region can be continuously collected, and the target device can be controlled based on the voice information, achieving continuous and natural deep interaction. After the user accurately locates and triggers the first control of the target device using the initial combination of gesture direction and the first voice command, there is no need to repeat the gesture direction action; subsequent voice commands are sufficient to continue controlling the same target device. Complex operations can be completed through continuous voice commands without interrupting the interaction, avoiding the tedious steps of re-identifying the target device for each control in traditional methods. This greatly simplifies the operation process and reduces the user's operational burden, making the human-computer interaction process smooth and efficient.

[0053] In some possible implementations, after controlling the target device based on gesture information and a first voice command, a control command is obtained for the target device, and the target device is controlled according to the control command. Here, the control command may be from the target user in the target audio zone and / or users in other audio zones in the vehicle. The control command may be a second voice command and / or a gesture command.

[0054] For example, control commands can come not only from the target user but also from users in other audio zones within the vehicle. Furthermore, they can be secondary voice commands and / or gesture commands, allowing users in other audio zones (such as the front passenger or rear passengers) to directly issue subsequent control commands to the target device via voice or gesture (e.g., "turn it up a little," "turn it off," or make a specific gesture). This enhances the flexibility and convenience of collaborative control in multi-occupant scenarios within the vehicle.

[0055] It should be noted that the control of the target device based on gesture information and the first voice command in this embodiment is similar to the implementation principle in the above embodiments, and can also be applied to the vehicle system 201.

[0056] The following is a detailed explanation: the vehicle system 201 can obtain the gesture information and first voice command of the target user corresponding to the target sound zone. The gesture information refers to the hand movements of the user who intends to control which device corresponding to the target sound zone. The following explanation will take recognizing the user's gesture direction from the image information of the target sound zone as an example.

[0057] For example, the vehicle infotainment system 201 can first determine the target sound zone for wake-up. This refers to the voice wake-up command issued by the user in a specific area within the vehicle cabin (e.g., the left seat in the second row). In response to this wake-up command, the vehicle infotainment system 201 can use technologies such as sound source localization to determine the physical sound source area from which the wake-up command was issued, thus identifying it as the target sound zone. This determination of the target sound zone ensures that subsequent information collection and interactive actions are directed towards the user who issued the wake-up command within the target sound zone (in this embodiment, this can be referred to as the target user).

[0058] After the target sound zone is established, the vehicle system 201 can call the camera corresponding to the target sound zone to obtain image information within the field of view of the target sound zone, and can activate the voice acquisition device (such as a directional microphone array) corresponding to the target sound zone to start receiving the specific operation command issued by the user immediately after the wake-up command, i.e., the first voice command.

[0059] For example, when a user in the target voice zone 204 says a wake-up word such as "Hello, voice assistant," the vehicle system 201 confirms that this voice zone is the target voice zone for this interaction. Subsequently, the user's intent can be executed. The vehicle system 201 can perform two acquisition tasks: First, it can obtain image information including the target user's posture, especially hand movements, through a camera 202 specifically designed to capture image information of the target voice zone 204. Then, it can obtain the user's first voice command, such as "turn on the lights" or "open the window," through a voice acquisition device 2041 (such as a microphone) in the target voice zone 204. At this point, the vehicle system 201 has completed the acquisition of both image information and voice commands for the target voice zone, providing the necessary and clear input for subsequent steps involving analyzing the user's gesture direction from the image information and combining it with the first voice command to control the finally determined target device.

[0060] The vehicle-mounted system 201 can control the target device based on gesture information and the first voice command.

[0061] For example, after the vehicle infotainment system 201 acquires the image information of the target audio zone 204, it can recognize the gestures in the image information. This allows it to determine which devices in the vehicle cabin are on the pointing path. For instance, if device 203 (which can be called the target device in this embodiment) is on the path indicated by the user's gesture, the vehicle infotainment system 201 can identify device 203 as the device the user wants to control.

[0062] It should be noted that there may be multiple devices along the path pointed to by the user's gesture. The vehicle system 201 can identify these multiple devices as candidate devices and then select the target device from among the candidate devices.

[0063] After the vehicle infotainment system 201 determines the device 203 that the user wants to control through the user's gesture, the voice assistant in the vehicle infotainment system 201 can control the device 203 based on the voice information (which can be called the first voice command in this embodiment) collected by the voice acquisition device 2041 corresponding to the voice zone 204. The voice information can be information about the function the user wants to achieve. For example, if the user wants to turn on the reading light but doesn't know its specific name, they can say "turn on the light" and point to the reading light, allowing the voice assistant to recognize and turn it on. This allows control of the target device based on gesture information and the first voice command.

[0064] The present application may provide another embodiment, which can also be applied to the above-mentioned vehicle infotainment system 201, including: The vehicle infotainment system 201 can determine whether each area in the vehicle is the target sound zone. If the area is not the target sound zone, it will not respond to the user's gesture information and voice commands corresponding to the non-target sound zone. That is, it will not respond to the user's gesture information and voice commands in the non-target sound zone, and thus will not control the corresponding device. The non-target sound zone is an unawakened sound zone.

[0065] When the target audio region is the target audio region, the target device is controlled in response to the gesture information and first voice command of the target user corresponding to the target audio region. The target audio region is the audio region that is activated according to the wake-up command. The region is the specific seat position of the user in the car, such as the driver's seat, the front passenger seat, or the left and right sides of the rear seats, which correspond to different regions.

[0066] The implementation principle of this embodiment is similar to that of the first embodiment, and will not be described in detail here. For details, please refer to the description in the first embodiment.

[0067] This embodiment establishes a clear distinction and differentiated response mechanism between the target and non-target sound zones. It only collects and responds to gesture information and voice commands within the specific target sound zone that issued the wake-up command, ignoring all interactive commands from other non-target sound zones. This design achieves precise isolation and anti-interference of the interaction source from the outset, completely resolving the problem of false triggering and misidentification caused by accidental actions or conversations of users in the unwakeable sound zone in a multi-user in-vehicle environment. This not only greatly improves the accuracy and reliability of control and avoids misoperation, but also optimizes system resource allocation, ensuring that computing and sensing resources focus only on the true intentions of valid users, thereby improving the user's human-computer interaction experience.

[0068] Figure 3 This is a schematic diagram of a vehicle control device provided in an embodiment of this disclosure. The device 300 can be a terminal as described in the above embodiment, or the device 300 can be a component or assembly within the terminal. The vehicle control device 300 provided in this embodiment can execute the processing flow provided in the vehicle control method embodiment, such as... Figure 3 As shown, it includes: The first acquisition unit 310 is used to acquire the gesture information and the first voice command of the target user corresponding to the target voice region to be woken up. The first control unit 320 is used to control the target device according to the gesture information and the first voice command.

[0069] Optionally, the first control unit is specifically used for: Identify the user's gesture direction from the image information of the target audio region; The target device is determined from the candidate devices based on the direction of the gesture; the candidate device is at least one device on the path that the gesture direction points to inside the vehicle. Control the target device according to the first voice command.

[0070] Optionally, the at least one device may be a plurality of candidate devices, and the first control unit is specifically used for: Information about the multiple candidate devices is output via voice. Obtain voice selection instructions for the multiple candidate devices; The target device is determined from the plurality of candidate devices according to the voice selection instruction.

[0071] Optionally, the at least one device may be a plurality of candidate devices, and the first control unit is specifically used for: Display information about the multiple candidate devices on the touchscreen in the target audio region; Obtain selection instructions for the displayed information; The target device is determined from the plurality of candidate devices according to the selection instruction.

[0072] Optionally, the device also includes: The second obtaining unit is used to obtain the second voice command issued by the target user to the target device; The second control unit is used to control the target device according to the second voice command.

[0073] Optionally, the second voice command includes at least one of the following types: including a pronoun control command for referring to the target device; Control commands including the full or partial name of the target device; It includes only control instructions for controlling actions.

[0074] Figure 3 The vehicle control device shown in the embodiment can be used to execute the technical solution of the above method embodiment. Its implementation principle and technical effect are similar, and will not be repeated here.

[0075] Figure 4 This is a schematic diagram of the structure of a computing device according to an embodiment of this disclosure. See below for details. Figure 4 It shows a schematic diagram of a structure suitable for implementing the computing device 400 in the embodiments of this disclosure. Figure 4 The computing device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments disclosed herein.

[0076] like Figure 4As shown, the computing device 400 may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 401, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 402 or a program loaded from a storage device 408 into a random access memory (RAM) 403 to implement the voice control method as described in the embodiments of this disclosure. Various programs and data required for the operation of the computing device 400 are also stored in the RAM 403. The processing unit 401, ROM 402, and RAM 403 are interconnected via a bus 404. An input / output (I / O) interface 405 is also connected to the bus 404.

[0077] Typically, the following devices can be connected to I / O interface 405: input devices 406 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 407 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 408 including, for example, magnetic tapes, hard disks, etc.; and communication devices 409. Communication device 409 allows computing device 400 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 4 A computing device 400 with various devices is shown, but it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.

[0078] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts, thereby implementing the voice control method as described above. In such embodiments, the computer program can be downloaded and installed from a network via communication device 409, or installed from storage device 408, or installed from ROM 402. When the computer program is executed by processing device 401, it performs the functions defined in the methods of embodiments of this disclosure.

[0079] It should be noted that the computer-readable medium described in this disclosure can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this disclosure, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. Suitable media for transmission are considered, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0080] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol) and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future-developed networks.

[0081] The aforementioned computer-readable medium may be included in the aforementioned computing device; or it may exist independently and not assembled into the computing device.

[0082] The aforementioned computer-readable medium carries one or more programs that, when executed by the computing device, cause the computing device to: By obtaining the gesture information of the target user in the target voice region and combining it with the first voice command in the target voice region, the user can control the device without knowing its specific name in advance, simply by using natural gestures and voice commands. This fundamentally solves the problem that users who are not familiar with the device name cannot perform effective voice interaction.

[0083] Optionally, when one or more of the above programs are executed by the computing device, the computing device may also execute other steps described in the above embodiments.

[0084] Computer program code for performing the operations of this disclosure can be written in one or more programming languages ​​or a combination thereof, including but not limited to object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0085] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0086] The units described in the embodiments of this disclosure can be implemented in software or hardware. The names of the units are not, in some cases, intended to limit the specific unit.

[0087] The functions described above in this document can be performed at least in part by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-a-chip (SoCs), complex programmable logic devices (CPLDs), and so on.

[0088] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0089] The above description is merely a preferred embodiment of this disclosure and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of this disclosure is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features disclosed in this disclosure that have similar functions.

[0090] Furthermore, while the operations are described in a specific order, this should not be construed as requiring these operations to be performed in the specific order shown or in a sequential order. In certain environments, multitasking and parallel processing may be advantageous. Similarly, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of this disclosure. Certain features described in the context of individual embodiments may also be implemented in combination in a single embodiment. Conversely, various features described in the context of a single embodiment may also be implemented individually or in any suitable sub-combination in multiple embodiments.

[0091] Although the subject matter has been described using language specific to structural features and / or methodological logic, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or actions described above. Rather, the specific features and actions described above are merely illustrative examples of implementing the claims.

[0092] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0093] The above description is merely a specific embodiment of this disclosure, enabling those skilled in the art to understand or implement it. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this disclosure. Therefore, this disclosure is not to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A vehicle control method characterized by, The method includes: Obtain the gesture information and first voice command of the target user corresponding to the target voice region to be awakened; The target device is controlled based on the gesture information and the first voice command.

2. The method of claim 1, wherein, The step of controlling the target device based on the gesture information and the first voice command includes: Identify the user's gesture direction from the image information of the target audio region; The target device is determined from the candidate devices based on the direction of the gesture; the candidate device is at least one device on the path that the gesture direction points to inside the vehicle. Control the target device according to the first voice command.

3. The method of claim 2, wherein, The at least one device is a plurality of candidate devices, and determining the target device from the candidate devices based on the gesture direction includes: Information about the multiple candidate devices is output via voice. Obtain voice selection instructions for the multiple candidate devices; The target device is determined from the plurality of candidate devices according to the voice selection instruction.

4. The method of claim 2, wherein, The at least one device is a plurality of candidate devices, and determining the target device from the candidate devices based on the gesture direction includes: Display information about the multiple candidate devices on the touchscreen in the target audio region; Obtain selection instructions for the displayed information; The target device is determined from the plurality of candidate devices according to the touch selection instruction.

5. The method according to any one of claims 1-4, characterized in that, After controlling the target device using the first voice command, the method further includes: Obtain the second voice command issued by the target user to the target device; The target device is controlled according to the second voice command.

6. The method according to claim 5, characterized in that, The second voice command includes at least one of the following types: including a pronoun control command for referring to the target device; Control commands including the full or partial name of the target device; It includes only control instructions for controlling actions.

7. A vehicle control method, characterized in that, The method includes: If the area is a non-target sound zone, the user's gesture information and voice commands corresponding to the non-target sound zone will not be responded to. The non-target sound zone is an unawakened sound zone. When the target audio region is the target audio region, the target device is controlled in response to the gesture information of the target user corresponding to the target audio region and the first voice command. The target audio region is the audio region that is activated according to the wake-up command.

8. A vehicle control method, characterized in that, The method includes: After controlling the target device based on gesture information and the first voice command, control commands issued to the target device are obtained; The target device is controlled according to the control command.

9. The method according to claim 8, characterized in that, The control commands come from the target user in the target audio zone and / or users in other audio zones within the vehicle.

10. The method according to claim 8 or 9, characterized in that, The control commands are second voice commands and / or gesture commands.

11. A computing device, comprising: Memory; processor; as well as Computer programs; The computer program is stored in the memory and configured to be executed by the processor to implement the method as described in any one of claims 1-10.

12. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1-10.

13. A computer program product, characterized in that, The computer program product includes a computer program or instructions that, when executed by a processor, implement the method as described in any one of claims 1-10.

14. A vehicle, characterized in that, Includes the computing device as described in claim 11.