Gesture control optimization methods, devices, terminals, and storage media

By collecting and generating target gesture scene fusion data on the terminal side to optimize the gesture control model, the problem of privacy leakage caused by uploading user data to the cloud is solved, and more accurate personalized gesture control optimization is achieved.

CN115393676BActive Publication Date: 2026-06-30HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2021-05-07
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Current technologies for optimizing gesture control require uploading user data to the cloud, which poses a risk of leaking user privacy.

Method used

The system collects the user's original gesture data and target scene data on the terminal side, generates target gesture scene fusion data, optimizes the gesture control model using the target gesture scene fusion data, including gesture category labels and background category labels, and performs training and optimization directly on the terminal side.

Benefits of technology

Protecting user privacy and generating target gesture scene fusion data with rich backgrounds and different categories can more accurately optimize gesture control models and perform personalized optimization for the same user.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115393676B_ABST
    Figure CN115393676B_ABST
Patent Text Reader

Abstract

This application provides a gesture control optimization method, device, terminal, and storage medium. The method includes: collecting raw gesture data and target scene data from a user; generating target gesture scene fusion data based on the raw gesture data, the target scene data, and target gesture key point data, wherein the target gesture scene fusion data is used to optimize the gesture control model on the terminal side. The technical solution provided by this application has the following advantages: 1) The gesture control model is trained and optimized directly on the terminal side, and user data does not need to be uploaded to the cloud, which can better protect user privacy; 2) The target gesture scene fusion data has built-in tags and forms multiple potential gesture usage scenarios, which can more accurately optimize the gesture control model; 3) The target gesture scene fusion data usually comes from the same user, which can accurately optimize the gesture control model for that user.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence (AI) technology, and more specifically to a gesture control optimization method, device, terminal, and storage medium. Background Technology

[0002] Gesture control is a type of human-computer interaction technology. Compared to traditional mouse and keyboard input, gesture control does not require the user to hold a specific input device; they can control the device or input specific information simply through specific hand movements. Due to the convenience and fun of contactless gestures, they are being widely used in the industry to control computer terminals, mobile terminals, television terminals, and more.

[0003] When users control devices using gestures, the gesture control model needs to be optimized to improve the accuracy of gesture control and thus enhance the user experience. Current technologies typically collect user data on the terminal side, then upload the data to the cloud. The cloud then optimizes the gesture control model based on the uploaded data, and finally redeploys the optimized gesture control model to the terminal, thus optimizing the terminal-side gesture control model.

[0004] However, user data often contains users' private information, and the above methods require uploading user data to the cloud, which poses a risk of leaking user privacy. Summary of the Invention

[0005] In view of this, this application provides a gesture control optimization method, device, terminal, and storage medium to help solve the problem that gesture control optimization in the prior art requires uploading user data to the cloud, which poses a risk of leaking user privacy.

[0006] In a first aspect, embodiments of this application provide a gesture control optimization method applied to a terminal. The method includes: collecting original gesture data and target scene data from a user, wherein the target scene data is used to characterize background information associated with the original gesture data; generating target gesture scene fusion data based on the original gesture data, the target scene data, and target gesture key point data, wherein the target gesture scene fusion data is used to optimize a gesture control model; wherein the target gesture scene fusion data includes gesture category labels and background category labels, wherein the gesture category labels are matched with the target gesture key point data, and the background category labels are matched with the target gesture scene fusion data.

[0007] Preferably, generating target gesture scene fusion data based on the original gesture data, the target scene data, and the target gesture key point data includes: inputting the original gesture data, the target scene data, and the target gesture key point data into a first gesture data generation model to generate target gesture scene fusion data.

[0008] Preferably, generating target gesture scene fusion data based on the original gesture data, the target scene data, and the target gesture key point data includes: inputting the original gesture data and the target gesture key point data into a second gesture data generation model to generate target gesture data; and inputting the target gesture data and the target scene data into a third gesture data generation model to generate target gesture scene fusion data.

[0009] Preferably, before generating target gesture scene fusion data based on the original gesture data, the target scene data, and the target gesture key point data, the method further includes: calling the target gesture key point generation model to generate target gesture key point data.

[0010] Preferably, after generating target gesture scene fusion data based on the original gesture data, the target scene data, and the target gesture key point data, the method further includes: training the gesture control model using the target gesture scene fusion data to optimize the gesture control model, wherein the gesture control model is used to recognize the user's gesture control operations.

[0011] Preferably, the collection of the user's original gesture data and target scene data includes: collecting the user's original gesture data and target scene data when the user performs a gesture control operation.

[0012] Secondly, embodiments of this application provide a gesture control optimization device, comprising: a data acquisition module for acquiring original gesture data and target scene data of a user, wherein the target scene data is used to characterize background information associated with the original gesture data; and a gesture data generation module for generating target gesture scene fusion data based on the original gesture data, the target scene data, and target gesture key point data, wherein the target gesture scene fusion data is used to optimize a gesture control model; wherein the target gesture scene fusion data includes gesture category labels and background category labels, wherein the gesture category labels are matched with the target gesture key point data, and the background category labels are matched with the target gesture scene fusion data.

[0013] Preferably, the gesture data generation module is specifically used to: input the original gesture data, the target scene data, and the target gesture key point data into the first gesture data generation model to generate target gesture scene fusion data.

[0014] Preferably, the gesture data generation module is specifically used to: input the original gesture data and the target gesture key point data into a second gesture data generation model to generate target gesture data; and input the target gesture data and the target scene data into a third gesture data generation model to generate target gesture scene fusion data.

[0015] Preferably, it further includes: a target gesture key point data generation module, used to call the target gesture key point generation model to generate target gesture key point data.

[0016] Preferably, it further includes: a training module, used to train the gesture control model using the target gesture scene fusion data, and to optimize the gesture control model, wherein the gesture control model is used to recognize the user's gesture control operations.

[0017] Preferably, the acquisition module is specifically used to: acquire the user's original gesture data and target scene data when the user performs a gesture control operation.

[0018] Thirdly, embodiments of this application provide a terminal, including: one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, and the one or more computer programs include instructions that, when executed by the terminal, cause the terminal to perform the method described in any one of the first aspects.

[0019] Fourthly, embodiments of this application provide a computer-readable storage medium including a stored program, wherein, when the program is executed, it controls the device where the computer-readable storage medium is located to perform the method described in any one of the first aspects.

[0020] The gesture control optimization scheme provided in this application has the following advantages:

[0021] 1) The gesture control model is trained and optimized directly on the terminal side, and user data does not need to be uploaded to the cloud, which can better protect user privacy;

[0022] 2) The target gesture key point data guides the size and shape of the gesture, and the target scene data replaces the background of the gesture to generate target gesture scene fusion data with rich backgrounds and different categories. This target gesture scene fusion data comes with labels and forms a variety of potential gesture usage scenarios, which can more accurately optimize the gesture control model.

[0023] 3) The target gesture scene fusion data usually comes from the same user, which can accurately optimize the gesture control model for that user. Attached Figure Description

[0024] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0025] Figure 1 This is a schematic diagram of an application scenario provided by an embodiment of this application;

[0026] Figure 2 This is a schematic diagram of the structure of a terminal provided in an embodiment of this application;

[0027] Figure 3 This is a schematic diagram of a gesture control scenario provided in an embodiment of this application;

[0028] Figure 4 This is a schematic diagram of another gesture control scenario provided in an embodiment of this application;

[0029] Figure 5 This is a schematic diagram of a gesture control optimization scheme in related technologies;

[0030] Figure 6 This is a schematic diagram of a gesture control optimization scheme in related technologies;

[0031] Figure 7 This is a schematic diagram of a data fusion scenario provided in an embodiment of this application;

[0032] Figure 8 This is a schematic flowchart of a gesture control optimization method provided in an embodiment of this application;

[0033] Figure 9 This application provides a schematic diagram of a feature fusion scenario based on an ensemble model.

[0034] Figure 10 This application provides a schematic diagram of a feature fusion scenario based on a cascaded model.

[0035] Figure 11 This is a schematic diagram of the structure of a gesture control optimization device provided in an embodiment of this application. Detailed Implementation

[0036] To better understand the technical solution of this application, the embodiments of this application will be described in detail below with reference to the accompanying drawings.

[0037] It should be understood that the described embodiments are merely some, not all, of the embodiments in this application. All other embodiments obtained by those skilled in the art based on the embodiments in this application without inventive effort are within the scope of protection of this application.

[0038] The terminology used in the embodiments of this application is for the purpose of describing particular embodiments only and is not intended to be limiting of this application. The singular forms “a,” “the,” and “the” used in the embodiments of this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise.

[0039] It should be understood that the term "and / or" used in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.

[0040] See Figure 1 This is a schematic diagram illustrating an application scenario provided by an embodiment of this application. Figure 1 In this embodiment, mobile phone 100 is used as an example to illustrate the terminal. It is understood that the terminal involved in this application embodiment, in addition to mobile phone 100, can also be tablet computer, personal computer (PC), personal digital assistant (PDA), smartwatch, netbook, wearable electronic device, augmented reality (AR) device, virtual reality (VR) device, in-vehicle device, smart car, smart speaker, robot, smart glasses, smart TV, etc.

[0041] See Figure 2 This is a schematic diagram of the structure of a terminal provided in an embodiment of this application. The terminal 200 can be... Figure 1 Server device 101 in the middle can also be Figure 1 Terminal device 102 in the middle.

[0042] Terminal 200 may include a processor 210, an external memory interface 220, an internal memory 221, a universal serial bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 250, a wireless communication module 260, an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, a headphone jack 270D, a sensor module 280, buttons 290, a motor 291, an indicator 292, a camera 293, a display screen 294, and a subscriber identification module (SIM) card interface 295, etc. The sensor module 280 may include a pressure sensor 280A, a gyroscope sensor 280B, a barometric pressure sensor 280C, a magnetic sensor 280D, an accelerometer sensor 280E, a distance sensor 280F, a proximity sensor 280G, a fingerprint sensor 280H, a temperature sensor 280J, a touch sensor 280K, an ambient light sensor 280L, a bone conduction sensor 280M, etc.

[0043] It is understood that the structure illustrated in the embodiments of the present invention does not constitute a specific limitation on the terminal 200. In other embodiments of this application, the terminal 200 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

[0044] Processor 210 may include one or more processing units, such as application processors (APs), modem processors, graphics processing units (GPUs), image signal processors (ISPs), controllers, video codecs, digital signal processors (DSPs), baseband processors, and / or neural network processing units (NPUs). These different processing units may be independent devices or integrated into one or more processors.

[0045] The controller can generate operation control signals based on the instruction opcode and timing signals to complete the control of instruction fetching and execution.

[0046] The processor 210 may also include a memory for storing instructions and data. In some embodiments, the memory in the processor 210 is a cache memory. This memory can store instructions or data that the processor 210 has just used or that are used repeatedly. If the processor 210 needs to use the instruction or data again, it can directly retrieve it from the memory. This avoids repeated accesses, reduces the waiting time of the processor 210, and thus improves the efficiency of the system.

[0047] In some embodiments, the processor 210 may include one or more interfaces. Interfaces may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver / transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input / output (GPIO) interface, a subscriber identity module (SIM) interface, and / or a universal serial bus (USB) interface, etc.

[0048] The I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 210 may include multiple I2C buses. The processor 210 can couple to the touch sensor 280K, charger, flash, camera 293, etc., through different I2C bus interfaces. For example, the processor 210 can couple to the touch sensor 280K through the I2C interface, enabling the processor 210 and the touch sensor 280K to communicate through the I2C bus interface, thereby realizing the touch function of the terminal 200.

[0049] The I2S interface can be used for audio communication. In some embodiments, the processor 210 may include multiple I2S buses. The processor 210 can be coupled to the audio module 270 via the I2S bus to enable communication between the processor 210 and the audio module 270. In some embodiments, the audio module 270 can transmit audio signals to the wireless communication module 260 via the I2S interface to enable the function of answering phone calls through a Bluetooth headset.

[0050] The PCM interface can also be used for audio communication, sampling, quantizing, and encoding analog signals. In some embodiments, the audio module 270 and the wireless communication module 260 can be coupled via the PCM bus interface. In some embodiments, the audio module 270 can also transmit audio signals to the wireless communication module 260 via the PCM interface, enabling the function of answering phone calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

[0051] The UART interface is a universal serial data bus used for asynchronous communication. This bus can be a bidirectional communication bus. It converts the data to be transmitted between serial and parallel communication. In some embodiments, the UART interface is typically used to connect the processor 210 and the wireless communication module 260. For example, the processor 210 communicates with the Bluetooth module in the wireless communication module 260 via the UART interface to implement Bluetooth functionality. In some embodiments, the audio module 270 can transmit audio signals to the wireless communication module 260 via the UART interface to enable music playback through Bluetooth headphones.

[0052] The MIPI interface can be used to connect the processor 210 to peripheral devices such as the display screen 294 and the camera 293. The MIPI interface includes a camera serial interface (CSI) and a display serial interface (DSI). In some embodiments, the processor 210 and the camera 293 communicate via the CSI interface to enable the shooting function of the terminal 200. The processor 210 and the display screen 294 communicate via the DSI interface to enable the display function of the terminal 200.

[0053] The GPIO interface can be configured via software. It can be configured as a control signal or a data signal. In some embodiments, the GPIO interface can be used to connect the processor 210 to a camera 293, a display screen 294, a wireless communication module 260, an audio module 270, a sensor module 280, etc. The GPIO interface can also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, etc.

[0054] USB port 230 is a USB standard compliant interface, specifically a Mini USB port, Micro USB port, or USB Type-C port. USB port 230 can be used to connect a charger to charge terminal 200, and can also be used for data transfer between terminal 200 and peripheral devices. It can also be used to connect headphones for audio playback. This interface can also be used to connect other terminals, such as AR devices.

[0055] It is understood that the interface connection relationships between the modules illustrated in the embodiments of the present invention are merely illustrative and do not constitute a structural limitation on the terminal 200. In other embodiments of this application, the terminal 200 may also employ different interface connection methods or combinations of multiple interface connection methods as described in the above embodiments.

[0056] The charging management module 240 receives charging input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 240 receives charging input from the wired charger via the USB interface 230. In some wireless charging embodiments, the charging management module 240 receives wireless charging input via the wireless charging coil of the terminal 200. While charging the battery 242, the charging management module 240 can also supply power to the terminal via the power management module 241.

[0057] The power management module 241 connects the battery 242, the charging management module 240, and the processor 210. The power management module 241 receives input from the battery 242 and / or the charging management module 240, providing power to the processor 210, internal memory 221, display screen 294, camera 293, and wireless communication module 260, etc. The power management module 241 can also monitor parameters such as battery capacity, battery cycle count, and battery health status (leakage current, impedance). In some other embodiments, the power management module 241 may also be located within the processor 210. In other embodiments, the power management module 241 and the charging management module 240 may be located in the same device.

[0058] The wireless communication function of terminal 200 can be implemented through antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, modem processor and baseband processor, etc.

[0059] Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in terminal 200 can be used to cover one or more communication frequency bands. Different antennas can also be multiplexed to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna for a wireless local area network. In some other embodiments, the antennas can be used in conjunction with a tuning switch.

[0060] The mobile communication module 250 can provide solutions for wireless communication applications including 2G / 3G / 4G / 5G on the terminal 200. The mobile communication module 250 may include at least one filter, switch, power amplifier, low-noise amplifier (LNA), etc. The mobile communication module 250 can receive electromagnetic waves via antenna 1, and perform filtering, amplification, and other processing on the received electromagnetic waves before transmitting them to a modem processor for demodulation. The mobile communication module 250 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves for radiation via antenna 1. In some embodiments, at least some functional modules of the mobile communication module 250 may be housed in the processor 210. In some embodiments, at least some functional modules of the mobile communication module 250 and at least some modules of the processor 210 may be housed in the same device.

[0061] The modem processor may include a modulator and a demodulator. The modulator modulates the low-frequency baseband signal to be transmitted into a mid-to-high frequency signal. The demodulator demodulates the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After processing by the baseband processor, the low-frequency baseband signal is transmitted to the application processor. The application processor outputs sound signals through an audio device (not limited to speaker 270A, receiver 270B, etc.) or displays images or videos through the display screen 294. In some embodiments, the modem processor may be a separate device. In other embodiments, the modem processor may be independent of the processor 210 and may be housed in the same device as the mobile communication module 250 or other functional modules.

[0062] The wireless communication module 260 can provide solutions for wireless communication applications on the terminal 200, including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared (IR) technologies. The wireless communication module 260 can be one or more devices integrating at least one communication processing module. The wireless communication module 260 receives electromagnetic waves via antenna 2, performs frequency modulation and filtering of the electromagnetic wave signals, and sends the processed signal to processor 210. The wireless communication module 260 can also receive signals to be transmitted from processor 210, perform frequency modulation and amplification, and convert them into electromagnetic waves for radiation via antenna 2.

[0063] In some embodiments, antenna 1 of terminal 200 is coupled to mobile communication module 250, and antenna 2 is coupled to wireless communication module 260, enabling terminal 200 to communicate with networks and other devices via wireless communication technology. The wireless communication technology may include Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and / or IR technologies, etc. The GNSS may include the Global Positioning System (GPS), the Global Navigation Satellite System (GLONASS), the BeiDou Navigation Satellite System (BDS), the Quasi-Zenith Satellite System (QZSS), and / or satellite-based augmentation systems (SBAS).

[0064] Terminal 200 implements display functions through a GPU, display screen 294, and application processor. The GPU is a microprocessor for image processing, connected to the display screen 294 and the application processor. The GPU is used to perform mathematical and geometric calculations and for graphics rendering. Processor 210 may include one or more GPUs, which execute program instructions to generate or modify display information.

[0065] Display screen 294 is used to display images, videos, etc. Display screen 294 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a Mini LED, a MicroLED, a Micro-OLED, a quantum dot light-emitting diode (QLED), etc. In some embodiments, terminal 200 may include one or N displays 294, where N is a positive integer greater than 1.

[0066] Terminal 200 can perform shooting functions through ISP, camera 293, video codec, GPU, display 294 and application processor.

[0067] The ISP (Image Signal Processor) is used to process data fed back from the camera 293. For example, when taking a picture, the shutter is opened, and light is transmitted through the lens to the camera's photosensitive element. The light signal is converted into an electrical signal, and the camera's photosensitive element transmits the electrical signal to the ISP for processing, transforming it into an image visible to the naked eye. The ISP can also perform algorithmic optimization on image noise, brightness, and skin tone. The ISP can also optimize parameters such as exposure and color temperature of the shooting scene. In some embodiments, the ISP can be set in the camera 293.

[0068] Camera 293 is used to capture still images or videos. An object is projected onto a photosensitive element by generating an optical image through the lens. The photosensitive element can be a charge-coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the light signal into an electrical signal, which is then transmitted to an ISP for conversion into a digital image signal. The ISP outputs the digital image signal to a DSP for processing. The DSP converts the digital image signal into image signals in standard RGB, YUV, or other formats. In some embodiments, terminal 200 may include one or N cameras 293, where N is a positive integer greater than 1.

[0069] A digital signal processor (DSP) is used to process digital signals. Besides digital image signals, it can also process other digital signals. For example, when terminal 200 selects a frequency point, the DSP can perform Fourier transforms on the frequency energy.

[0070] Video codecs are used to compress or decompress digital video. Terminal 200 may support one or more video codecs. Thus, terminal 200 can play or record videos in various encoding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

[0071] NPU stands for Neural Network (NN) Computing Processor. By borrowing the structure of biological neural networks, such as the transmission patterns between neurons in the human brain, it can rapidly process input information and continuously learn on its own. NPUs can enable intelligent cognitive applications in terminals, such as image recognition, facial recognition, speech recognition, and text understanding.

[0072] The external storage interface 220 can be used to connect an external storage card, such as a Micro SD card, to expand the storage capacity of the terminal 200. The external storage card communicates with the processor 210 through the external storage interface 220 to perform data storage functions. For example, music, video, and other files can be saved on the external storage card.

[0073] Internal memory 221 can be used to store computer executable program code, which includes instructions. Internal memory 222 can include a program storage area and a data storage area. The program storage area can store the operating system, at least one application program required for a function (such as sound playback, image playback, etc.), etc. The data storage area can store data created during the use of terminal 200 (such as audio data, phonebook, etc.). Furthermore, internal memory 221 can include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc. Processor 210 executes various functional applications and data processing of terminal 200 by running instructions stored in internal memory 221 and / or instructions stored in memory located in the processor.

[0074] Terminal 200 can implement audio functions, such as music playback and recording, through audio module 270, speaker 270A, receiver 270B, microphone 270C, headphone jack 270D, and application processor.

[0075] The audio module 270 is used to convert digital audio information into analog audio signals for output, and also to convert analog audio input into digital audio signals. The audio module 270 can also be used for encoding and decoding audio signals. In some embodiments, the audio module 270 may be located in the processor 210, or some functional modules of the audio module 270 may be located in the processor 210.

[0076] The speaker 270A, also known as a "loudspeaker," is used to convert audio electrical signals into sound signals. The terminal 200 can listen to music or make hands-free calls through the speaker 270A.

[0077] The receiver 270B, also known as the "earpiece," is used to convert audio electrical signals into sound signals. When the terminal 200 receives a phone call or voice message, the receiver 270B can be brought close to the listener's ear to receive the voice message.

[0078] Microphone 270C, also known as a "microphone" or "voice transducer," is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can speak by bringing their mouth close to microphone 270C, inputting the sound signal into microphone 270C. Terminal 200 may have at least one microphone 270C. In some embodiments, terminal 200 may have two microphones 270C, which, in addition to collecting sound signals, can also perform noise reduction. In other embodiments, terminal 200 may have three, four, or more microphones 270C, which can collect sound signals, reduce noise, identify the sound source, and perform directional recording, etc.

[0079] The headphone jack 270D is used to connect wired headphones. The headphone jack 270D can be a USB 230 interface or a 3.5mm Open Mobile Terminal Platform (OMTP) standard interface, a CTIA (Cellular Telecommunications Industry Association of the USA) standard interface.

[0080] Pressure sensor 280A is used to sense pressure signals and convert them into electrical signals. In some embodiments, pressure sensor 280A can be disposed on display screen 294. There are many types of pressure sensors 280A, such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors. A capacitive pressure sensor may include at least two parallel plates with conductive material. When force is applied to pressure sensor 280A, the capacitance between the electrodes changes. Terminal 200 determines the pressure intensity based on the change in capacitance. When a touch operation is applied to display screen 294, terminal 200 detects the intensity of the touch operation based on pressure sensor 280A. Terminal 200 can also calculate the touch position based on the detection signal from pressure sensor 280A. In some embodiments, touch operations applied to the same touch position but with different touch operation intensities can correspond to different operation commands. For example: when a touch operation with an intensity less than a first pressure threshold is applied to the SMS application icon, a command to view an SMS is executed. When a touch operation with an intensity greater than or equal to the first pressure threshold is applied to the SMS application icon, a command to create a new SMS is executed.

[0081] The gyroscope sensor 280B can be used to determine the motion attitude of the terminal 200. In some embodiments, the gyroscope sensor 280B can determine the angular velocity of the terminal 200 around three axes (i.e., the x, y, and z axes). The gyroscope sensor 280B can be used for image stabilization. For example, when the shutter is pressed, the gyroscope sensor 280B detects the angle of the terminal 200's shake, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to counteract the shake of the terminal 200 through reverse movement, thus achieving image stabilization. The gyroscope sensor 280B can also be used in navigation and motion-sensing game scenarios.

[0082] The barometric pressure sensor 280C is used to measure air pressure. In some embodiments, the terminal 200 calculates altitude using the air pressure value measured by the barometric pressure sensor 280C to assist in positioning and navigation.

[0083] The magnetic sensor 280D includes a Hall sensor. The terminal 200 can use the magnetic sensor 280D to detect the opening and closing of the flip cover. In some embodiments, when the terminal 200 is a flip phone, the terminal 200 can detect the opening and closing of the flip cover using the magnetic sensor 280D. Then, based on the detected opening and closing state of the cover or the flip cover, features such as automatic flip unlocking can be set.

[0084] The accelerometer 280E can detect the magnitude of acceleration of the terminal 200 in various directions (typically three axes). When the terminal 200 is stationary, it can detect the magnitude and direction of gravity. It can also be used to identify the terminal's posture and can be applied to applications such as screen orientation switching and pedometers.

[0085] A distance sensor 280F is used to measure distance. Terminal 200 can measure distance via infrared or laser. In some embodiments, during a shooting scene, terminal 200 can utilize the distance sensor 280F for distance measurement to achieve fast focusing.

[0086] The proximity sensor 280G may include, for example, a light-emitting diode (LED) and a light detector, such as a photodiode. The LED may be an infrared LED. The terminal 200 emits infrared light outward through the LED. The terminal 200 uses the photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 200. When insufficient reflected light is detected, the terminal 200 can determine that there is no object near the terminal 200. The terminal 200 can use the proximity sensor 280G to detect when the user holds the terminal 200 close to their ear for a call, so as to automatically turn off the screen to save power. The proximity sensor 280G can also be used in holster mode and pocket mode for automatic unlocking and screen locking.

[0087] The ambient light sensor 280L is used to sense the ambient light intensity. The terminal 200 can adaptively adjust the brightness of the display screen 294 according to the sensed ambient light intensity. The ambient light sensor 280L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 280L can also work with the proximity sensor 280G to detect whether the terminal 200 is in a pocket to prevent accidental touches.

[0088] The fingerprint sensor 280H is used to collect fingerprints. The terminal 200 can use the characteristics of the collected fingerprints to achieve fingerprint unlocking, access to application locks, fingerprint photography, fingerprint answering of incoming calls, etc.

[0089] Temperature sensor 280J is used to detect temperature. In some embodiments, terminal 200 uses the temperature detected by temperature sensor 280J to execute a temperature processing strategy. For example, when the temperature reported by temperature sensor 280J exceeds a threshold, terminal 200 reduces the performance of the processor located near temperature sensor 280J to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is below another threshold, terminal 200 heats battery 242 to prevent abnormal shutdown of terminal 200 due to low temperature. In still other embodiments, when the temperature is below yet another threshold, terminal 200 boosts the output voltage of battery 242 to prevent abnormal shutdown due to low temperature.

[0090] Touch sensor 280K, also known as a "touch device," can be located on display screen 294. The touch sensor 280K and display screen 294 together form a touchscreen, also known as a "touchscreen." Touch sensor 280K detects touch operations applied to or near it. The touch sensor can transmit the detected touch operation to the application processor to determine the type of touch event. Visual output related to the touch operation can be provided through display screen 294. In other embodiments, touch sensor 280K may also be located on the surface of terminal 200, in a different position than display screen 294.

[0091] The bone conduction sensor 280M can acquire vibration signals. In some embodiments, the bone conduction sensor 280M can acquire vibration signals from the vibrating bone segments of the human vocal cords. The bone conduction sensor 280M can also contact the human pulse to receive blood pressure signals. In some embodiments, the bone conduction sensor 280M can also be incorporated into headphones to form bone conduction headphones. The audio module 270 can parse the voice signals from the vibrating bone segments of the vocal cords acquired by the bone conduction sensor 280M to realize voice functionality. The application processor can parse heart rate information from the blood pressure signals acquired by the bone conduction sensor 280M to realize heart rate detection functionality.

[0092] Buttons 290 include a power button, volume buttons, etc. Buttons 290 can be mechanical buttons or touch-sensitive buttons. Terminal 200 can receive button input and generate key signal inputs related to user settings and function control of the terminal 200.

[0093] Motor 291 can generate vibration alerts. Motor 291 can be used for incoming call vibration alerts or for touch vibration feedback. For example, different vibration feedback effects can be corresponding to touch operations applied to different applications (such as taking photos, playing audio, etc.). Motor 291 can also correspond to different vibration feedback effects for touch operations applied to different areas of the display screen 294. Different application scenarios (such as time reminders, receiving messages, alarm clocks, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also be customized.

[0094] Indicator 292 can be an indicator light, which can be used to indicate charging status, power changes, messages, missed calls, notifications, etc.

[0095] The SIM card interface 295 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 295 to make contact with and separate from the terminal 200. The terminal 200 can support one or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 295 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. Multiple cards can be inserted into the same SIM card interface 295 simultaneously. The multiple cards can be of the same or different types. The SIM card interface 295 is also compatible with different types of SIM cards. The SIM card interface 295 is also compatible with external memory cards. The terminal 200 interacts with the network through the SIM card to realize functions such as calls and data communication. In some embodiments, the terminal 200 uses an eSIM, i.e., an embedded SIM card. The eSIM card can be embedded in the terminal 200 and cannot be separated from the terminal 200.

[0096] With the development of computer vision and the improvement of edge computing power, gesture control has gradually become a way for users to interact with terminals.

[0097] See Figure 3 This is a schematic diagram of a gesture control scenario provided in an embodiment of this application. Figure 3 The image shows a television 301 and a user 302. The user 302 can input corresponding control commands to the television 301 by "extending their arms," ​​causing the television 301 to perform corresponding actions, such as turning on the television or zooming in on the display interface.

[0098] See Figure 4 This is a schematic diagram of another gesture control scenario provided in an embodiment of this application. Figure 4 The image shows a mobile phone 401 and a user's hand 402. In the current state, the photo album on the mobile phone 401 is open, and multiple images are displayed on the screen. The user can input corresponding control commands to the mobile phone 401 by "swinging down" the hand, causing the mobile phone 401 to perform corresponding actions. For example, swiping down on an image in the display screen switches between images.

[0099] As can be seen from the above gesture control scenarios, gesture control does not require the user to hold a specific input device; they can control the terminal or input specific information into the terminal simply through specific hand movements. Specifically, the controlled terminal typically includes an image acquisition module and a gesture control model. The image acquisition module can collect the user's gesture data, and the gesture control model can recognize the user's hand movements based on this gesture data, thereby generating corresponding control commands. The image acquisition module can be a camera, and the gesture control model can be a neural network model; this application embodiment does not impose specific limitations on these aspects.

[0100] In practical applications, to improve the user experience, the gesture control model needs to be optimized to enhance the accuracy of gesture control during user interaction with the terminal. In other words, the goal is to make the gesture control function "better with use."

[0101] See Figure 5 This is a schematic diagram of a gesture control optimization scheme in related technologies. Figure 5 The diagram illustrates a cloud platform 501 and a terminal 502, which are communicatively connected for information transmission. In some possible embodiments, the cloud platform 501 may also be referred to as a server.

[0102] In this embodiment, the cloud platform 501 initially trains a gesture control model based on a shared dataset. When a terminal 502 needs to use the gesture control function, the cloud platform 501 deploys the gesture control model for that terminal 502. During use, the terminal 502 collects the user's gesture data so that the gesture control model can make model predictions based on the user's gesture data, thereby achieving the corresponding gesture control. Simultaneously, the terminal 502 stores this gesture data (in user data).

[0103] When the gesture control model needs optimization, terminal 502 uploads the stored gesture data to cloud 501. Cloud 501 then trains, evaluates, and optimizes the gesture control model based on the user-uploaded gesture data. After cloud 501 completes the optimization, it redeploys the optimized gesture control model to terminal 502, thus optimizing the gesture control model on terminal 502. In other words, the gesture control model is optimized on the cloud 501 side, and then deployed to terminal 502.

[0104] However, the above-mentioned gesture control optimization methods mainly have the following problems:

[0105] 1) Gesture data often contains user privacy information. The above methods require uploading user data to the cloud, which poses a risk of leaking user privacy.

[0106] 2) The gesture data collected by the terminal does not contain tags. After the gesture data is uploaded to the cloud, the category of the gesture data needs to be manually labeled, which is costly.

[0107] 3) The data used to optimize the gesture control model in the cloud usually comes from multiple users, and the gesture control model cannot be optimized for a specific user.

[0108] To address the aforementioned issues, this application provides a gesture control optimization method that generates target gesture scene fusion data with labels and backgrounds on the terminal side, and uses the target gesture scene fusion data to complete the training and upgrading of the gesture control model on the terminal side.

[0109] See Figure 6 This is a schematic diagram of a gesture control optimization scheme in related technologies. In this embodiment, to facilitate the distinction between the gesture data collected by the terminal 602 and the gesture data generated by the gesture data generation model, the gesture data collected by the terminal 602 is referred to as "raw gesture data"; the gesture data generated by the gesture data generation model is referred to as "target gesture data"; and the target gesture data after fusing target scene data is referred to as "target gesture scene fusion data". A detailed description follows.

[0110] When the gesture control function is used for the first time by terminal 602, the cloud 601 deploys a gesture control model for terminal 602. During use, terminal 602 can periodically collect raw gesture data and target scene data through its camera. The target scene data is used to represent the background information associated with the raw gesture data. After collecting the raw gesture data and target scene data, the data is stored in the user data for later use. Additionally, terminal 602 calls a target gesture keypoint generation model to generate a large amount of target gesture keypoint data. The raw gesture data, target scene data, and target gesture keypoint data are input into the gesture data generation model to obtain target gesture scene fusion data.

[0111] It is understandable that after fusing target scene data and target gesture key point data, a large amount of target gesture scene fusion data will be obtained. Based on the target gesture scene fusion data, the gesture control model will be trained directly on the terminal 602 side to complete the optimization of the gesture control model.

[0112] See Figure 7 This is a schematic diagram of a data fusion scenario provided in an embodiment of this application. Figure 7 The image shows the original gesture data, target scene data, target gesture key point data, and target gesture scene fusion data after feature fusion.

[0113] The original gesture data consists of an image of a clenched fist captured by the terminal; the target gesture keypoint data consists of keypoints of an open palm generated by the target gesture keypoint generation model; and the target scene data consists of an image of the user's face. After feature fusion of the original gesture data, target scene data, and target gesture keypoint data, target gesture scene fusion data is obtained, where the background is the user's face and the gesture is an open palm.

[0114] Understandably, target gesture keypoint data is used to guide the size and shape of gestures in the target gesture scene fusion data; therefore, target gesture keypoint data can characterize the gesture category in the target gesture scene fusion data. Target scene data is used to guide the background of gestures in the target gesture scene fusion data; therefore, target scene data can characterize the background category in the target gesture scene fusion data. In other words, the target gesture scene fusion data generated after feature fusion contains gesture category labels and background category labels. Target gesture keypoint data is used to label the gesture category labels, and target scene data is used to label the background category labels. Additionally, the original gesture data is used to provide other information during feature fusion, such as the user's skin color.

[0115] The gesture control optimization scheme provided in this application has the following advantages:

[0116] 1) The gesture control model is trained and optimized directly on the terminal side, and user data does not need to be uploaded to the cloud, which can better protect user privacy;

[0117] 2) The target gesture key point data guides the size and shape of the gesture, and the target scene data replaces the background of the gesture to generate target gesture scene fusion data with rich backgrounds and different categories. This target gesture scene fusion data comes with labels and forms a variety of potential gesture usage scenarios, which can more accurately optimize the gesture control model.

[0118] 3) The target gesture scene fusion data usually comes from the same user, which can accurately optimize the gesture control model for that user.

[0119] See Figure 8 This is a schematic flowchart of a gesture control optimization method provided in an embodiment of this application. Figure 8 As shown, it mainly includes the following steps.

[0120] Step S801: Collect the user's original gesture data and target scene data, wherein the target scene data is used to characterize the background information associated with the original gesture data.

[0121] In this embodiment of the application, in order to facilitate the distinction between the gesture data collected by the terminal and the gesture data generated by the gesture data generation model, the gesture data collected by the terminal is referred to as "raw gesture data"; the gesture data generated by the gesture data generation model is referred to as "target gesture data"; and the target gesture data after fusing the target scene data is referred to as "target gesture scene fusion data".

[0122] It should be noted that the terminal can collect raw gesture data and target scene data when the user performs gesture control operations, or it can collect raw gesture data and target scene data at other times according to preset data collection rules. In addition, raw gesture data and target scene data can be collected separately or simultaneously, and this application embodiment does not impose specific restrictions on this.

[0123] It's understandable that similar information often exists within the same usage scenario of a terminal. Using target scene data as the background for gestures can improve the accuracy of gesture recognition. For example, since a terminal typically corresponds to one user, the user's facial image can be collected as target scene data; or, if a user typically uses the terminal while sitting on a sofa in the living room, the wall behind the sofa can be collected as target scene data.

[0124] Step S802: Generate target gesture scene fusion data based on the original gesture data, the target scene data, and the target gesture key point data.

[0125] In one optional embodiment, the terminal can invoke a target gesture keypoint generation model to generate target gesture keypoint data. After inputting the original gesture data, target scene data, and target gesture keypoint data into the gesture data generation model for feature fusion, target gesture scene fusion data is generated. That is, this target gesture scene fusion data simultaneously integrates information from the original gesture data, the target scene data, and the target gesture keypoint data. Specifically, the target gesture keypoint data guides the size and shape of the gesture in the target gesture scene fusion data; the target scene data guides the background of the gesture in the target gesture scene fusion data; and the original gesture data provides information for other aspects during feature fusion, such as the user's skin color.

[0126] In this embodiment, by combining target gesture key point data and target scene data to expand the original gesture data, a large amount of target gesture scene fusion data can be generated, providing the terminal with sufficient data for training the gesture control model. Furthermore, by using target gesture key point data to guide the size and shape of the gesture, and by replacing the gesture background with target scene data, rich and diverse target gesture scene fusion data with different backgrounds is generated. This target gesture scene fusion data is inherently labeled and forms multiple potential gesture usage scenarios, enabling more precise optimization of the gesture control model.

[0127] In one alternative embodiment, the gesture data generation model can be a Generative Adversarial Network (GAN) model. Specifically, the gesture data generation model can be further divided into ensemble models and cascaded models, which will be described separately below.

[0128] See Figure 9 This is a schematic diagram of a feature fusion scenario based on an ensemble model provided in an embodiment of this application. The ensemble model includes a gesture data generation model, namely a first gesture data generation model. The first gesture data generation model includes a first generator and a first discriminator.

[0129] The original gesture data, target scene data, and target gesture keypoint data are input into the first generator. After convolution and deconvolution operations, a new gesture image is generated, which is the target gesture scene fusion data. As can be seen from the figure, the gesture in the new gesture image corresponds to the gesture in the target gesture keypoint data; the background in the new gesture image corresponds to the target scene data. In other words, through feature fusion, the generator replaces the category and background of the original gesture.

[0130] Furthermore, the first discriminator determines the authenticity, gesture category, and scene category of the target gesture scene fusion data. Through a game between the first generator and the first discriminator, target gesture scene fusion data with gesture category labels and scene category labels is obtained.

[0131] In a specific implementation, the first gesture data generation model can be a GAN model. This application does not impose specific limitations on this.

[0132] See Figure 10 This diagram illustrates a feature fusion scenario based on a cascaded model, as provided in an embodiment of this application. The cascaded model includes two gesture data generation models: a second gesture data generation module and a third gesture data generation model. The second gesture data generation model includes a second generator and a second discriminator; the third gesture data generation model includes a third generator and a third discriminator.

[0133] First, the original gesture data and target gesture keypoint data are input into the second generator. After convolution and deconvolution operations, target gesture data is generated, which is a new gesture image (at this point, it does not contain background information). As can be seen from the figure, the gesture in the new gesture image corresponds to the gesture in the target gesture keypoint data. That is, through this feature fusion, the generator replaces the category of the original gesture. Further, the second discriminator judges the authenticity and gesture category of the target gesture data. Through the interaction between the second generator and the second discriminator, target gesture data with gesture category labels is obtained.

[0134] Secondly, the target gesture data and target scene data obtained in the above steps are fused again, and the result is input into the third generator. After convolution and deconvolution operations, target gesture scene fusion data is generated. As can be seen from the figure, the target gesture scene fusion data adds the corresponding background information from the target scene data. In other words, through this feature fusion, the third generator replaces the background of the original gesture. Furthermore, the third discriminator judges the authenticity and scene category of the target gesture scene fusion data. Through the game between the third generator and the third discriminator, target gesture scene fusion data with gesture category labels and scene category labels is obtained.

[0135] It should be noted that the scheme of generating target gesture scene fusion data through a cascaded model is simple to train and easy to implement. However, because this scheme uses a two-level gesture data generation model, it is prone to cumulative errors. Those skilled in the art can choose an ensemble model or a cascaded model to generate target gesture scene fusion data according to actual needs.

[0136] In specific implementations, the second gesture data generation model and / or the third gesture data generation model can be GAN models. This application does not impose specific limitations in this regard.

[0137] Step S803: Train the gesture control model using the target gesture scene fusion data and optimize the gesture control model, wherein the gesture control model is used to recognize the user's gesture control operation.

[0138] Specifically, after obtaining the target gesture scene fusion data, the gesture control model can be trained on the terminal side based on the target gesture scene fusion data, thereby optimizing the gesture control model. Through the terminal's self-learning, the user experience of the terminal becomes better and better.

[0139] The gesture control optimization scheme provided in this application has the following advantages:

[0140] 1) The gesture control model is trained and optimized directly on the terminal side, and user data does not need to be uploaded to the cloud, which can better protect user privacy;

[0141] 2) The target gesture key point data guides the size and shape of the gesture, and the target scene data replaces the background of the gesture to generate target gesture scene fusion data with rich backgrounds and different categories. This target gesture scene fusion data comes with labels and forms a variety of potential gesture usage scenarios, which can more accurately optimize the gesture control model.

[0142] 3) The target gesture scene fusion data usually comes from the same user, which can accurately optimize the gesture control model for that user.

[0143] Corresponding to the above method embodiments, this application also provides a gesture control optimization device.

[0144] See Figure 11 This is a schematic diagram of the structure of a gesture control optimization device provided in an embodiment of this application. Figure 11 As shown, the gesture control optimization device includes a data acquisition module 1101 and a gesture data generation module 1102.

[0145] Specifically, the acquisition module 1101 is used to acquire the user's original gesture data and target scene data, wherein the target scene data is used to characterize the background information associated with the original gesture data; the gesture data generation module 1102 is used to generate target gesture scene fusion data based on the original gesture data, the target scene data, and target gesture key point data, wherein the target gesture scene fusion data is used to optimize the gesture control model; wherein the target gesture key point data is used to characterize the gesture category of the target gesture scene fusion data, and the target scene data is used to characterize the background category of the target gesture scene fusion data.

[0146] In a specific implementation, the acquisition module 1101 can be a camera on the terminal or other types of sensors, and this application embodiment does not impose specific limitations on this.

[0147] In one optional embodiment, the gesture data generation module 1102 is specifically used to: input the original gesture data, the target scene data, and the target gesture key point data into the first gesture data generation model to generate target gesture scene fusion data.

[0148] In one optional embodiment, the gesture data generation module 1102 is specifically used to: input the original gesture data and the target gesture key point data into a second gesture data generation model to generate target gesture data; and input the target gesture data and the target scene data into a third gesture data generation model to generate target gesture scene fusion data.

[0149] In one optional embodiment, the gesture control optimization device further includes: a target gesture key point data generation module, used to call the target gesture key point generation model to generate target gesture key point data.

[0150] In one optional embodiment, the gesture control optimization device further includes: a training module, used to train the gesture control model using the target gesture scene fusion data, and optimize the gesture control model, wherein the gesture control model is used to recognize the user's gesture control operations.

[0151] In one optional embodiment, the acquisition module 1101 is specifically used to: acquire the user's original gesture data and target scene data when the user performs a gesture control operation.

[0152] The gesture control optimization scheme provided in this application has the following advantages:

[0153] 1) The gesture control model is trained and optimized directly on the terminal side, and user data does not need to be uploaded to the cloud, which can better protect user privacy;

[0154] 2) The target gesture key point data guides the size and shape of the gesture, and the target scene data replaces the background of the gesture to generate target gesture scene fusion data with rich backgrounds and different categories. This target gesture scene fusion data comes with labels and forms a variety of potential gesture usage scenarios, which can more accurately optimize the gesture control model.

[0155] 3) The target gesture scene fusion data usually comes from the same user, which can accurately optimize the gesture control model for that user.

[0156] The specific details of the above-mentioned device implementation can be found in the method embodiments, and will not be repeated here for the sake of brevity.

[0157] In a specific implementation, this application also provides a terminal, which includes one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, and the one or more computer programs include instructions that, when executed by the terminal, cause the terminal to perform some or all of the steps in the above embodiments.

[0158] In a specific implementation, this application also provides a computer storage medium, wherein the computer storage medium can store a program, wherein when the program runs, it controls the device where the computer-readable storage medium is located to execute some or all of the steps in the above embodiments. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.

[0159] In a specific implementation, this application also provides a computer program product, which includes executable instructions that, when executed on a computer, cause the computer to perform some or all of the steps in the above method embodiments.

[0160] In this application embodiment, "at least one" refers to one or more, and "more than one" refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent the existence of A alone, the simultaneous existence of A and B, or the existence of B alone. A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" and similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, and c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple.

[0161] Those skilled in the art will recognize that the units and algorithm steps described in the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of electronic hardware and software. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0162] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0163] In several embodiments provided by this invention, any function, if implemented as a software functional unit and sold or used as an independent product, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0164] The above description is merely a specific embodiment of the present invention. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this invention should be included within the protection scope of this invention. The protection scope of this invention should be determined by the scope of the claims.

Claims

1. A gesture control optimization method, characterized in that, Applied to a terminal, the method includes: Collect the user's raw gesture data and target scene data, wherein the target scene data is used to characterize the background information associated with the raw gesture data; Call the target gesture key point generation model to generate target gesture key point data; Based on the original gesture data, the target scene data, and the target gesture key point data, target gesture scene fusion data is generated, and the target gesture scene fusion data is used to optimize the gesture control model. The target gesture scene fusion data includes gesture category labels and background category labels. The gesture category labels are matched with the target gesture key point data, and the background category labels are matched with the target gesture scene fusion data.

2. The method according to claim 1, characterized in that, The step of generating target gesture scene fusion data based on the original gesture data, the target scene data, and the target gesture key point data includes: The original gesture data, the target scene data, and the target gesture key point data are input into the first gesture data generation model to generate target gesture scene fusion data.

3. The method according to claim 1, characterized in that, The step of generating target gesture scene fusion data based on the original gesture data, the target scene data, and the target gesture key point data includes: The original gesture data and the target gesture key point data are input into the second gesture data generation model to generate the target gesture data. The target gesture data and the target scene data are input into the third gesture data generation model to generate target gesture scene fusion data.

4. The method according to claim 1, characterized in that, After generating the target gesture scene fusion data based on the original gesture data, the target scene data, and the target gesture key point data, the method further includes: The gesture control model is trained and optimized by fusing data from the target gesture scene, wherein the gesture control model is used to recognize the user's gesture control operations.

5. The method according to claim 1, characterized in that, The collection of the user's original gesture data and the target scene data includes: When a user performs a gesture control operation, the system collects the user's original gesture data and the target scene data.

6. A gesture control optimization device, characterized in that, include: The acquisition module is used to acquire the user's raw gesture data and target scene data, wherein the target scene data is used to characterize the background information associated with the raw gesture data; The target gesture key point data generation module is used to call the target gesture key point generation model to generate target gesture key point data. The gesture data generation module is used to generate target gesture scene fusion data based on the original gesture data, the target scene data, and the target gesture key point data. The target gesture scene fusion data is used to optimize the gesture control model. The target gesture scene fusion data includes gesture category labels and background category labels. The gesture category labels are matched with the target gesture key point data, and the background category labels are matched with the target gesture scene fusion data.

7. The apparatus according to claim 6, characterized in that, The gesture data generation module is specifically used for: The original gesture data, the target scene data, and the target gesture key point data are input into the first gesture data generation model to generate target gesture scene fusion data.

8. The apparatus according to claim 6, characterized in that, The gesture data generation module is specifically used for: The original gesture data and the target gesture key point data are input into the second gesture data generation model to generate the target gesture data. The target gesture data and the target scene data are input into the third gesture data generation model to generate target gesture scene fusion data.

9. The apparatus according to claim 6, characterized in that, Also includes: The training module is used to train and optimize the gesture control model using the target gesture scene fusion data, wherein the gesture control model is used to recognize the user's gesture control operations.

10. The apparatus according to claim 6, characterized in that, The acquisition module is specifically used for: When a user performs a gesture control operation, the system collects the user's original gesture data and the target scene data.

11. A terminal, characterized in that, include: One or more processors; Memory; And one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs including instructions that, when executed by the terminal, cause the terminal to perform the method of any one of claims 1-5.

12. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored program, wherein, when the program is executed, it controls the device on which the computer-readable storage medium is located to perform the method according to any one of claims 1-5.