An adaptive multi-channel remote gesture interaction method and system

By employing an adaptive multi-channel remote attitude interaction method, and utilizing dynamic sensor adaptation and intelligent trajectory prediction, the problem of interaction instability in hardware and network environments is solved, achieving a stable and smooth remote control experience in complex environments.

CN122261384APending Publication Date: 2026-06-23CHENGDU LEISHITIANDI ELECTRONIC TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHENGDU LEISHITIANDI ELECTRONIC TECHNOLOGY CO LTD
Filing Date
2026-03-10
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing remote interaction solutions based on mobile terminal posture perception suffer from insufficient reliability and smoothness due to differences in hardware configuration, magnetic field interference, and network instability, making it difficult to achieve continuity and stability in user operations.

Method used

An adaptive multi-channel remote attitude interaction method is adopted, which combines dynamic sensor adaptation, intelligent trajectory prediction and network adaptation technology with multi-channel hot switching and state management mechanism to ensure the stability and continuity of the interaction process.

Benefits of technology

In complex environments, it achieves reliable remote control with hardware robustness, network adaptability, and user-friendly experience, improving the smoothness and real-time performance of interaction, and reducing operation latency and cursor lag.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122261384A_ABST
    Figure CN122261384A_ABST
Patent Text Reader

Abstract

The application relates to an adaptive multi-channel remote gesture interaction method and system, which comprises the following steps: generating a working mode identifier according to a sensor available state; collecting sensor original data streams and solving to generate standardized gesture vectors; inputting the vectors into a trajectory prediction model trained based on historical data to generate prediction trajectory data; combining current channel bandwidth characteristics and the prediction data to perform frame rate adaptation and motion smoothing processing on the gesture vectors to generate optimized instruction data packets; acquiring a user operation event, performing hierarchical local feedback while sending the instruction data packets; when it is monitored that a network packet loss rate exceeds a threshold value, generating a session state snapshot and synchronizing the snapshot to a backup channel to complete a hot switching; analyzing instructions and snapshot rendering cursor trajectories at a remote end, and generating an updated session state snapshot based on trajectory behavior characteristics; and storing the snapshot into a trajectory tracking database for subsequent state recovery. The application can improve user remote interaction experience in a complex scene.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of human-computer interaction technology, and in particular to an adaptive multi-channel remote posture interaction method and system. Background Technology

[0002] With the increasing popularity of large-screen display devices such as smart TVs, smart projectors, and large-format conference all-in-one machines, users' demand for precise and smooth interaction from a distance is constantly growing. Traditional infrared or Bluetooth remote controls, which use directional keys to move the screen focus, are inefficient and offer a poor user experience when performing complex tasks. To address this, the industry has gradually developed an advanced remote control interaction method that utilizes mobile smart terminals (such as smartphones). This method fully leverages the various sensors built into the phone to detect changes in the user's spatial posture while holding the phone, mapping and controlling the cursor movement on the remote screen. This achieves a directional interaction effect similar to an air mouse or motion-sensing stick, aiming to provide a more direct and natural remote operation experience.

[0003] However, this remote interaction solution based on mobile terminal posture awareness still faces a series of severe challenges in terms of stability and user experience during actual deployment and application. First, there are significant differences in the hardware configurations of mobile devices. Low-end and mid-range devices may lack key sensors such as magnetometers, while even on high-end devices with complete sensors, strong magnetic field interference in the daily environment (such as proximity to metal furniture or magnetized whiteboards) can easily cause cursor drift or even loss of control in systems that rely on magnetometers for orientation calibration, greatly reducing the reliability and universality of the interaction system. Second, the dynamic instability of the wireless network environment is an unavoidable problem. Bandwidth fluctuations, latency jitter, and data packet loss in Wi-Fi or Bluetooth channels can directly cause cursor movement stuttering, jumping, or delayed command response, severely disrupting the smoothness and real-time performance of the interaction. Furthermore, during the interaction process, if switching between different communication channels is necessary due to network degradation, maintaining the continuity of the user's operation session, avoiding cursor position resets or loss of incomplete commands, and achieving a smooth transition that is imperceptible to the user are also urgent problems to be solved. Summary of the Invention

[0004] To address the aforementioned technical issues, this application provides an adaptive multi-channel remote posture interaction method and system.

[0005] Firstly, this application provides an adaptive multi-channel remote posture interaction method, which adopts the following technical solution: Obtain sensor configuration information from the mobile control terminal and generate a working mode identifier based on the availability status of the gyroscope, accelerometer, and magnetometer; The system collects raw sensor data streams from the mobile control terminal, performs attitude calculations based on the working mode identifier, and generates standardized attitude vectors. A trajectory prediction model is trained based on a historical attitude vector sequence. The standardized attitude vector is then input into the trajectory prediction model to generate predicted trajectory data. Obtain the bandwidth characteristics of the current transmission channel, combine them with the predicted trajectory data, perform frame rate adaptation and motion smoothing on the standardized attitude vector, and generate an optimized instruction data packet; Acquire user operation event data, send the optimization instruction data packet to the remote receiving end through the current transmission channel, and simultaneously perform hierarchical local feedback based on the user operation event data; Monitor the network quality indicators of the current transmission channel. When the network packet loss rate exceeds the preset packet loss rate threshold, generate a session state snapshot containing the cursor position and event sequence and synchronize it to the backup transmission channel, and update the current transmission channel to the backup transmission channel. The optimized instruction data packet and session state snapshot are received at the remote receiving end, parsed and rendered to show the screen cursor trajectory, and an updated session state snapshot is generated based on the cursor trajectory behavior characteristics. The updated session state snapshot is stored in the trajectory tracking database of the mobile control terminal for state recovery during subsequent transmission channel switching.

[0006] By adopting the above technical solutions, a dynamic sensor adaptation mechanism is used to easily cope with hardware limitations and environmental interference. Intelligent trajectory prediction and network adaptation technologies ensure the smoothness and real-time performance of control commands. Furthermore, an efficient multi-channel hot-switching and state management mechanism ensures the continuity and stability of the interaction process. Ultimately, this application transforms remote pointing interaction, which was originally constrained by the hardware and software environment, into a reliable service that combines hardware robustness, network adaptability, and user-friendly experience, achieving stable and smooth control effects in complex real-world environments.

[0007] Secondly, this application provides an adaptive multi-channel remote posture interaction system, which adopts the following technical solution: The working mode decision module is used to obtain the sensor configuration information of the mobile control terminal and generate a working mode identifier based on the availability status of the gyroscope, accelerometer and magnetometer. The attitude calculation module is used to collect the raw sensor data stream of the mobile control terminal, perform attitude calculation according to the working mode identifier, and generate a standardized attitude vector. The trajectory prediction module is used to train a trajectory prediction model based on historical attitude vector sequences, and input the standardized attitude vectors into the trajectory prediction model to generate predicted trajectory data. The optimized instruction data generation module is used to obtain the bandwidth characteristics of the current transmission channel, combine the predicted trajectory data, perform frame rate adaptation and motion smoothing processing on the standardized attitude vector, and generate an optimized instruction data packet. The local feedback separation and transmission module acquires user operation event data, sends the optimization instruction data packet to the remote receiving end through the current transmission channel, and performs hierarchical local feedback based on the user operation event data. The session state snapshot module monitors the network quality indicators of the current transmission channel. When the network packet loss rate exceeds a preset packet loss rate threshold, it generates a session state snapshot containing the cursor position and event sequence and synchronizes it to the backup transmission channel. A channel hot-switching module is used to update the current transmission channel to the backup transmission channel; The remote rendering and behavior analysis module is used to receive the optimization instruction data packet and session state snapshot at the remote receiving end, parse and render the screen cursor trajectory, and generate an updated session state snapshot based on the cursor trajectory behavior characteristics. The trajectory tracking database module is used to store the updated session state snapshot to the trajectory tracking database of the mobile control terminal for state recovery during subsequent transmission channel switching.

[0008] Thirdly, this application provides a computer-readable storage medium, which adopts the following technical solution: A computer-readable storage medium storing a computer program that can be loaded by a processor and executed as in any of the methods in the first aspect.

[0009] In summary, this application includes at least one of the following beneficial technical effects: Through a sensor adaptive degradation mechanism, it effectively solves the problem of the existing technology's forced dependence on a nine-axis sensor, enabling the system to automatically switch to a six-axis or three-axis mode when the magnetometer is missing or subjected to strong magnetic field interference, thereby ensuring compatibility with most mobile devices and improving anti-interference capabilities; simultaneously, with the help of a multi-channel command pipeline and hot-switching technology, the system can seamlessly switch between different transmission channels such as Wi-Fi or Bluetooth, maintaining a stable frame rate and smooth cursor rendering even under high packet loss rates; and through trajectory prediction and real-time tactile feedback mechanisms on the device side, it significantly reduces operation latency, allowing users to obtain a stable and smooth remote control experience even when facing away from a large screen or in a weak network environment, demonstrating broad practical value and market adaptability. Attached Figure Description

[0010] Figure 1 This is a first flowchart of an adaptive multi-channel remote gesture interaction method according to one embodiment of this application.

[0011] Figure 2This is a second flowchart illustrating an adaptive multi-channel remote gesture interaction method according to one embodiment of this application.

[0012] Figure 3 This is a third flowchart illustrating an adaptive multi-channel remote gesture interaction method according to one embodiment of this application.

[0013] Figure 4 This is a schematic diagram of the fourth process of an adaptive multi-channel remote gesture interaction method according to one embodiment of this application.

[0014] Figure 5 This is a fifth flowchart of an adaptive multi-channel remote posture interaction method according to one embodiment of this application.

[0015] Figure 6 This is a sixth flowchart of an adaptive multi-channel remote gesture interaction method according to one embodiment of this application.

[0016] Figure 7 This is a schematic diagram of the seventh process of an adaptive multi-channel remote posture interaction method according to one embodiment of this application.

[0017] Figure 8 This is the eighth flowchart of an adaptive multi-channel remote gesture interaction method according to one embodiment of this application. Detailed Implementation

[0018] To make the purpose, technical solution, and advantages of this application clearer, the following description is provided in conjunction with the appendix. Figures 1-8 The present application will be further described in detail below with reference to embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of the application.

[0019] This application discloses an adaptive multi-channel remote posture interaction method.

[0020] Reference Figure 1 An adaptive multi-channel remote gesture interaction method, the specific method includes: Step S101: Obtain sensor configuration information from the mobile control terminal and generate a working mode identifier based on the availability status of the gyroscope, accelerometer, and magnetometer. The underlying logic of this step is to build a dynamic sensor resource management layer. Since the hardware configurations of mobile devices (such as mobile phones) vary greatly, not all devices are equipped with a complete nine-axis sensor (i.e., a three-axis gyroscope, a three-axis accelerometer, and a three-axis magnetometer).

[0021] Specifically, upon system startup, the system first detects the actual availability and data quality of these sensors in the current mobile control terminal device. For example, it checks whether the magnetometer exists; if so, it further reads its initial data to determine whether it is in an invalid state due to strong magnetic field interference.

[0022] Based on this detection result, the system generates a clear operating mode identifier, such as "nine-axis mode" (all sensors are complete and functioning normally), "six-axis mode" (magnetometer missing or malfunctioning, using only gyroscopes and accelerometers), or the more basic "three-axis mode" (using only accelerometers). This identifier defines the sensor data source and fusion strategy upon which the subsequent attitude calculation algorithm relies, thus achieving broad compatibility at the hardware level. This ensures that the system can start and run on mobile terminals of different levels and device types, providing access assurance for core functions.

[0023] Step S102: Collect the raw sensor data stream from the mobile control terminal, perform attitude calculation based on the working mode identifier, and generate a standardized attitude vector. Different operating modes require different algorithms to calculate the device's spatial attitude from the raw data. This is because gyroscopes provide angular velocity, accelerometers measure linear acceleration including gravitational acceleration, and magnetometers provide an orientation reference relative to the Earth's magnetic north pole. The operating mode designation directly determines which sensor fusion algorithm (such as a quaternion-based complementary filter or a Kalman filter) is used.

[0024] In some embodiments, such as in "nine-axis mode", the system fuses data from three types of sensors to calculate the device orientation that can be pointed to with absolute accuracy (eliminating gyroscope drift); while in "six-axis mode", due to the lack of orientation correction from the magnetometer, the system relies only on the gyroscope and accelerometer to estimate the relative attitude. Although there may be slow orientation drift, it can effectively avoid instantaneous jumps caused by magnetic field interference.

[0025] It should be noted that, regardless of the algorithm used, the final output of this step is a "normalized attitude vector" (usually represented by a quaternion or rotation matrix), which uniformly describes the rotational state of the device in three-dimensional space.

[0026] Step S103: Train the trajectory prediction model based on the historical attitude vector sequence, input the standardized attitude vector into the trajectory prediction model, and generate predicted trajectory data. This step introduces an artificial intelligence prediction mechanism, which aims to proactively address the interaction lag caused by network latency and jitter. The logical principle is that user operation behavior usually has continuity and regularity in a short period of time. The system can train a trajectory prediction model by continuously recording the historical standardized posture vector sequence in a short period of time. For example, an LSTM recurrent neural network can be used in the embodiment of this application.

[0027] Specifically, LSTM models excel at learning long-term dependencies in sequential data. When a new current attitude vector is input, the model does not simply repeat the history, but analyzes the current motion trend (such as velocity, angular velocity, and acceleration) to predict possible changes in the device's attitude within the next tens of milliseconds and generate predicted trajectory data.

[0028] Understandably, the predicted trajectory data acts as a forward-looking buffer for cursor movement. When network conditions are good, this predicted data can serve as a smoothing compensation; when network latency or packet loss occurs, this predicted data can greatly alleviate screen cursor stuttering or jumping caused by lost or delayed packets, thereby improving visual smoothness.

[0029] Step S104: Obtain the bandwidth characteristics of the current transmission channel, combine the predicted trajectory data, perform frame rate adaptation and motion smoothing on the standardized attitude vector, and generate an optimized instruction data packet. The system monitors the bandwidth characteristics of the current transmission channel (such as Wi-Fi or Bluetooth Low Energy) in real time, including available bandwidth, current latency, and jitter. Based on these network conditions, the system intelligently determines an appropriate frame rate (frame rate adaptation), for example, reducing the transmission frequency to avoid exacerbating congestion when the network is congested.

[0030] More importantly, motion smoothing is also performed: the obtained predicted trajectory data is compared with the generated real-time standardized attitude vector. If the real-time data deviates from the predicted trajectory due to sensor noise or sudden small jitters, the smoothing algorithm will correct it. Alternatively, when the predicted data is more representative of the smooth motion trend, the system may prioritize the predicted data or perform a weighted fusion with the real-time data to generate an interpolated frame.

[0031] Ultimately, all these processing results are packaged into an optimized instruction data packet. This data packet is no longer the raw sensor readings, but a refined, optimized, and network-adapted set of instructions designed to deliver the smoothest and most accurate cursor movement intentions with minimal data usage.

[0032] Step S105: Obtain user operation event data, send the optimization instruction data packet to the remote receiving end through the current transmission channel, and perform hierarchical local feedback based on the user operation event data; This step separates and decouples operation from feedback, a key design feature for improving user experience in high-latency scenarios. Specifically, user actions (such as clicks and swipes on a touchscreen) and device spatial movement are two distinct events. However, traditional remote interaction requires waiting for the operation command to be transmitted to the remote end, executed, and then feedback to be sent back, resulting in a significant perceived operation delay.

[0033] In this embodiment, the two are processed separately. The optimized instruction data packet (containing cursor movement information) is sent normally to the remote receiving end. At the same time, the user operation event data is processed immediately on the local mobile terminal to trigger the corresponding "hierarchical local feedback". For example, for different events such as "single", "double", and "long press", tactile vibration (vibration) of different intensities and modes is activated.

[0034] This means that users can receive precise tactile confirmation via phone vibration the instant they press the screen, sensing that "the command has been issued," without waiting for an actual response from a remote large screen. This reduces the user's subjective perception of network latency and provides an instant and definite interactive experience, which is especially crucial in scenarios where the user's back is to the screen (such as when singing karaoke) or where network latency is high.

[0035] Step S106: Monitor the network quality indicators of the current transmission channel. When the network packet loss rate exceeds the preset packet loss rate threshold, generate a session state snapshot containing the cursor position and event sequence and synchronize it to the backup transmission channel, and update the current transmission channel as the backup transmission channel. In this embodiment, the system continuously monitors network quality metrics (such as packet loss rate and latency) of the current channel. When the packet loss rate exceeds a preset threshold, it indicates a severe deterioration in the current channel quality, and a handover is imminent. At this point, the system does not wait but immediately captures the current session state and generates a session state snapshot. This snapshot is a lightweight data structure containing core contextual information. It not only records the precise position of the cursor on the remote screen but may also include a sequence of user operation events that have not yet been confirmed by the remote end. This is akin to taking a "snapshot" of the current interactive session, preserving the minimum necessary information set sufficient to restore the situation on another channel. This ensures the continuity of the user experience after channel switching and avoids interaction interruptions caused by lost cursor position or missing operation commands.

[0036] Subsequently, the system immediately uses the still available link (or a backup channel established directly between the mobile terminal and the receiving terminal, such as switching from Wi-Fi to BLE) to synchronize this snapshot data to the remote receiving terminal. After synchronization is complete, the system officially updates the current transmission channel to the backup transmission channel. Since the backup transmission channel has already received the snapshot information used to restore the scene, when subsequent optimization instruction data packets begin to be transmitted through the new channel, the remote receiving terminal can quickly reconstruct the interaction context using the snapshot data. This makes the channel switch almost imperceptible to the user, achieving a true "hot switch" rather than a "cold start," greatly enhancing the system's reliability in fluctuating network environments.

[0037] Step S107: Receive optimization instruction data packets and session state snapshots at the remote receiving end, parse and render the screen cursor trajectory, and generate an updated session state snapshot based on the cursor trajectory behavior characteristics. In this process, the remote receiver (such as a smart TV) receives data packets through the new channel, parses the gestures or cursor movement instructions within them, and renders them as cursor trajectories on the screen. Simultaneously, the receiver does not passively execute commands; it actively analyzes the behavioral characteristics of the cursor trajectory (such as acceleration and rate of change of direction) to determine the current operational intent (whether it's precise pointing or rapid swiping). When a specific behavioral pattern is identified (such as three consecutive accelerations exceeding a threshold, which can be labeled as "rapid swiping"), the receiver generates an updated session state snapshot. This new snapshot not only includes the current cursor position and event state but also adds behavioral feature tags, making the session state description richer and more intelligent, providing data support for more accurate state recovery and possible personalized adaptation.

[0038] Step S108: Store the updated session state snapshot to the trajectory tracking database of the mobile control terminal for state recovery during subsequent transmission channel switching.

[0039] The updated session state snapshot is sent back to the mobile control terminal and stored in a local trajectory tracking database. This database effectively forms an archive of user interaction history. It not only serves to restore the state during subsequent channel switching (such as switching from Bluetooth Low Energy back to Wi-Fi), ensuring consistency regardless of the direction of the switch, but also provides the system with a data foundation for long-term learning of user operating habits. By analyzing historical data, the system can further optimize the accuracy of the trajectory prediction model, and may even implement intelligent transmission channel optimization strategies based on user habits in the future, enabling the entire system to continuously self-optimize.

[0040] In the above embodiments, a dynamic sensor adaptation mechanism is used to readily address hardware limitations and environmental interference. Intelligent trajectory prediction and network adaptation technologies ensure the fluency and real-time performance of control commands. Furthermore, an efficient multi-channel hot-switching and state management mechanism ensures the continuity and stability of the interaction process. Ultimately, this application transforms remote pointing interaction, which was originally constrained by the hardware and software environment, into a reliable service that combines hardware robustness, network adaptability, and user-friendly experience, achieving stable and smooth control effects in complex real-world environments.

[0041] Reference Figure 2 As one implementation of step S102, the steps of acquiring the raw sensor data stream from the mobile control terminal, performing attitude calculation based on the working mode identifier, and generating a standardized attitude vector include: Step S201: Collect three-axis angular velocity data, three-axis acceleration data and three-axis magnetic field data from the mobile control terminal, perform timestamp alignment and unit normalization processing respectively, and generate a synchronous sensor data stream; In mobile devices, gyroscopes, accelerometers, and magnetometers are independently operating sensors with different physical characteristics and operating parameters. Gyroscopes output angular velocity, reflecting the instantaneous speed of the device's rotation, and typically have the highest sampling rate (e.g., 200Hz). Accelerometers output linear acceleration, including gravitational acceleration, and their sampling rate may vary (e.g., 100Hz). Magnetometers output the ambient magnetic field strength, and their sampling rate is often the lowest (e.g., 50Hz). This heterogeneity means that the directly read raw data points are not strictly corresponding in time, and their physical units differ (e.g., degrees per second, meters per second², microtesla).

[0042] Specifically, timestamp alignment uses interpolation algorithms (such as linear interpolation or spline interpolation) to unify all this data onto the same high-precision time base, creating simultaneity conditions for subsequent sensor data fusion and avoiding calculation errors caused by asynchronous data times. The subsequent "unit normalization" process unifies all data to the International System of Units (SI) (e.g., angular velocity in radians / second, acceleration in meters / second², and magnetic field in microtesla), eliminating dimensional differences and enabling data on different physical quantities to be weighted, compared, and fused within a unified mathematical framework.

[0043] Step S202: Assign attitude calculation strategies according to the working mode identifier and generate attitude quaternions; The essence of attitude calculation is to estimate the orientation of the device in three-dimensional space based on a series of observations, while the operating mode identifier determines the quantity and quality of available observations.

[0044] In some embodiments, if the nine-axis mode is identified, the synchronous sensor data stream is input into a quaternion extended Kalman filter to output a preliminary attitude quaternion; if the six-axis mode is identified, the gyroscope data is subjected to heading angle drift suppression based on gravity vector constraints to output an attitude quaternion without a magnetometer; if the three-axis mode is identified, the relative attitude quaternion is directly calculated by integrating the gyroscope angular velocity. These three strategies can ensure that the best attitude estimation results can be provided under any hardware conditions.

[0045] Specifically, in the ideal "nine-axis mode," the system possesses data from all three types of sensors. In this case, a quaternion-extended Kalman filter (EKF) is employed as the optimal estimator. The Kalman filter works by combining the system's dynamic model (described by gyroscope data) and observation model (provided by accelerometer and magnetometer data), using a recursive algorithm to continuously predict and correct, optimally fusing all information. The EKF can handle the nonlinear problem of attitude calculation. The gravity direction provided by the accelerometer can accurately correct for pitch and roll drift, while the geomagnetic north reference provided by the magnetometer can correct for yaw drift, thus outputting an absolutely accurate and long-term stable preliminary attitude quaternion.

[0046] When in "six-axis mode" (magnetometer failure or absence), the system loses its geomagnetic reference and can no longer correct for heading drift. The strategy here is to suppress heading drift by constraining gyroscope data using gravity vector constraints. Although an absolute heading cannot be obtained, the gravity vector provided by the accelerometer can still very accurately constrain and correct the pitch and roll drift generated by gyroscope integration. For the heading angle, the reality of its slow drift over time is acknowledged, but the speed of this drift is suppressed through algorithms (such as using accelerometer data to partially constrain gyroscope bias estimates), keeping it usable for a short period. This strategy sacrifices absolute orientation accuracy but gains stability in environments with magnetic field interference.

[0047] In the most basic "three-axis mode," the system relies solely on the gyroscope. At this point, attitude calculation degenerates into a pure gyroscope angular velocity integral, meaning the relative attitude change is calculated by integrating the angular velocity over time. This method has the lowest computational cost, but the inherent zero-point drift and noise of the gyroscope accumulate during the integration process, causing the attitude estimate to diverge rapidly. Therefore, it can only be used for measuring relative attitude changes over short periods.

[0048] Step S203: Perform coordinate system transformation on the attitude quaternion to map the attitude quaternion in the device coordinate system to the screen reference coordinate system; The attitude quaternion describes the rotation of the device's body coordinate system (usually defined as X-axis along the long side of the phone, Y-axis along the short side of the phone, and Z-axis perpendicular to the screen outwards) relative to the Earth coordinate system (usually defined as east-north-sky). However, users are concerned with the movement of the cursor in the screen coordinate system (usually defined as X-axis horizontal to the right, Y-axis vertically upwards, and Z-axis pointing towards the observer). Therefore, coordinate system transformation is necessary.

[0049] Specifically, this coordinate system transformation is typically achieved using a fixed "calibration quaternion," which defines the relative rotation between the device coordinate system and the screen coordinate system when the device is in its initial calibration posture (e.g., the user points the phone horizontally towards the center of the screen). Through quaternion multiplication, the device's current posture quaternion is transformed into the screen coordinate system, allowing a single quaternion to directly represent "the device's current posture relative to its initial calibration posture, in the screen coordinate system." This step accurately maps the device's physical rotation to the screen's virtual space, laying the mathematical foundation for subsequently converting the rotation into cursor displacement.

[0050] Step S204: Extract the attitude angle vector after coordinate system transformation, including pitch angle, roll angle and yaw angle, and generate a standardized attitude vector.

[0051] For control logic, Euler angles (i.e., pitch, roll, and yaw) are more intuitive, as they decompose complex three-dimensional rotations into three independent, physically meaningful rotations around axes. These Euler angles can be extracted from the attitude quaternions in the screen coordinate system using a defined mathematical formula.

[0052] Specifically, the pitch angle typically represents the device's up-and-down head movement, mapped to the vertical movement of the cursor on the screen; the roll angle represents the device's left-and-right tilt movement, which may not be directly used for cursor control or as an auxiliary function in this scheme; and the yaw angle represents the device's left-and-right turning movement, mapped to the horizontal movement of the cursor on the screen. This generated standardized attitude vector containing three angles becomes a well-defined, fixed-dimensional data interface.

[0053] In the above embodiments, the dynamic strategy allocation mechanism can cope with the challenges of different hardware configurations and environmental interference, ensuring the basic availability and robustness of the system. Furthermore, through coordinate system transformation and data normalization, it provides high-quality, low-noise input for downstream trajectory prediction, network transmission, and cursor rendering, which is the foundation for this application to achieve high precision, low latency, and a smooth experience.

[0054] Reference Figure 3As one implementation of step S103, the steps of training a trajectory prediction model based on historical attitude vector sequences, inputting standardized attitude vectors into the trajectory prediction model, and generating predicted trajectory data include: Step S301: Load historical attitude vector sequences from the pre-stored trajectory tracking database and construct a training dataset by sampling according to time windows; The trajectory tracking database stores a sequence of standardized attitude vectors generated over time, recording the spatial trajectory of the screen cursor as the user interacted with the mobile control terminal. However, the raw time-series data is continuous and lengthy, making it inefficient for direct training and difficult to capture local patterns. Therefore, a time window sampling method is needed to construct a structured training dataset.

[0055] Specifically, using each historical moment in the database as a benchmark, a fixed-length (e.g., 500 milliseconds) data segment is extracted as the model's input features, and a shorter data segment (e.g., 100 milliseconds) following that moment is extracted as the prediction target (i.e., labeled data) that the model needs to learn. Through this sliding window approach, vast amounts of historical data can be transformed into tens of thousands of independent "input-output" sample pairs. Each sample pair implicitly contains local patterns for inferring future trajectories from past trajectories under specific motion patterns (e.g., slow movement, fast sliding, stable pointing).

[0056] Step S302: Initialize the gated recurrent neural network model structure and use the training dataset as input for time series feature extraction; In this embodiment, a gated recurrent neural network is used. Its core mechanism lies in its complex internal "gating" structure (such as input gate, forget gate, and output gate). These gating mechanisms are like a precise adjustment system that can autonomously decide which important long-term information to remember, which irrelevant short-term noise to forget, and how to use the remembered content for the current calculation.

[0057] When the constructed training dataset is input into the initialized gated recurrent neural network, the network does not process the entire time window of data at once, but rather processes each pose vector in the sequence step by step and in chronological order. During processing, the hidden states within the network are continuously updated, acting like a memory unit, accumulating and integrating all relevant information from the beginning of the sequence to the current moment. This process is time series feature extraction, and its output is a high-level, abstract feature representation of the input sequence. This representation encodes the dynamic characteristics of the user's operation trajectory (such as the changing trends of velocity and acceleration), rather than just the pose information at a single time point.

[0058] Step S303: Optimize network weight parameters using gradient descent algorithm to generate trajectory prediction model; The weight parameters of the initialized neural network are randomly set, so its predictive ability is almost zero. The gradient descent algorithm is the key optimizer that drives the model's learning.

[0059] Specifically, first, the model makes predictions on the training dataset based on the current parameters and compares the predictions with the actual future trajectories (i.e., labeled data), quantifying the magnitude of the prediction error using a pre-defined "loss function." Then, the algorithm calculates the gradient of this loss function with respect to each weight parameter of the network; the gradient indicates the direction and magnitude by which each parameter should be adjusted to reduce the prediction error.

[0060] Next, all parameters are updated in tiny increments along the opposite direction of their gradients. This process is repeated countless times across the entire training dataset, with each iteration improving the network's predictive ability slightly. The ultimate goal is to find an optimal set of weight parameters that allows the model to predict the future based on historical data with minimal overall error. When the training process converges and the loss function stabilizes near its minimum, a trajectory prediction model with strong temporal prediction capabilities is obtained.

[0061] Step S304: Input the standardized attitude vectors acquired in real time into the trajectory prediction model and output the attitude coordinate prediction values ​​for the future time window. Specifically, the system leverages the knowledge learned by the model to make forward-looking inferences about current user behavior. During real-time interaction, the system continuously adds the latest standardized attitude vectors to the most recent historical sequence and immediately inputs this short real-time sequence into a pre-trained trajectory prediction model. Based on the trajectory evolution patterns learned from massive amounts of historical data, the model calculates the device's attitude coordinates within a short future time window (e.g., 50-200 milliseconds). This predicted attitude coordinate is not a fixed point, but rather a probability distribution representing the most likely future trajectory as perceived by the model based on current movement trends. This prediction is crucial because it provides the system with an extremely short look-ahead window, enabling it to anticipate upcoming movements and prepare for network latency and data loss.

[0062] Step S305: Integrate the current standardized attitude vector and attitude coordinate prediction values ​​to generate predicted trajectory data that includes temporal relationships.

[0063] This process integrates the predicted attitude coordinates for future time windows with the real-time acquired standardized attitude vectors. This integration is not merely a simple data concatenation; it involves arranging the data in strict chronological order to form a continuous data stream. For example, the output predicted trajectory data might be a data structure that begins with the exact attitude at the current moment, followed by predicted attitudes for several future time points. This data structure explicitly includes temporal relationships; each data point carries its corresponding timestamp, clearly describing the complete motion trajectory from the present to the near future.

[0064] Understandably, this predictive trajectory data, rich in spatiotemporal information, provides a basis for decision-making for subsequent frame rate adaptation and motion smoothing modules, enabling them to intelligently generate interpolated frames or perform data compensation, thereby ensuring the smoothness of cursor movement even when network conditions are poor.

[0065] In the above implementation, the continuous user operation history is transformed into training samples with causal relationships that can be recognized by machine learning models. This enables the system to move beyond simply passively responding to real-time sensor data and instead gain the ability to predict the short-term future based on historical behavior. This predictive capability effectively combats cursor stuttering and jumping caused by network transmission latency and jitter, transforming the originally lagging and passive interactive experience into a smooth control experience. It is a key technological guarantee for improving system response speed and visual smoothness.

[0066] Reference Figure 4 As one implementation of step S104, the steps of obtaining the bandwidth characteristics of the current transmission channel, combining the predicted trajectory data, performing frame rate adaptation and motion smoothing processing on the standardized attitude vector, and generating an optimized instruction data packet include: Step S401: Analyze the bandwidth characteristics of the current transmission channel and determine the target frame rate threshold; Different transmission channels (such as Wi-Fi and Bluetooth Low Energy) have drastically different physical bandwidths, protocol overheads, and typical latency characteristics. The system monitors the bandwidth characteristics of the current transmission channel in real time, which includes not only the theoretical maximum bandwidth but also the actual available throughput, packet round-trip latency, and network jitter at the current moment. Based on a deep analysis of these network conditions, the system needs to determine a suitable target frame rate threshold. This threshold represents the highest allowable command transmission frequency under the current network conditions to maintain stable, low-latency transmission without causing network congestion or buffer backlog.

[0067] For example, in a high-quality Wi-Fi environment, the target frame rate can be set to a high 60Hz to pursue the ultimate smoothness; while on the bandwidth-limited Bluetooth Low Energy channel, it may be necessary to reduce the target frame rate to 30Hz or lower, prioritizing the reliability and real-time performance of the command, and avoiding the counterproductive effect of increased packet loss rate due to excessively fast data transmission. This decision ensures that the system can adaptively respond to the dynamic changes in the network environment.

[0068] Step S402: Based on the target frame rate threshold, resample the standardized pose vector sequence in the time dimension. The standardized pose vector sequence typically has a high frequency (e.g., 100Hz or 200Hz), which ensures the finesse of pose capture. However, the target frame rate threshold is usually lower than this original frequency. Sending all the high-frequency data directly would not only waste valuable bandwidth but could also exceed the network's capacity. Therefore, temporal resampling is required. This is not simply discarding data but an intelligent frame extraction process.

[0069] Specifically, the system uniformly and selectively extracts the most representative attitude vectors from the original high-frequency sequence according to the time interval corresponding to the target frame rate, forming a down-framed attitude sequence that matches the network bandwidth. This process is similar to frame rate conversion in video encoding, aiming to preserve key frame information of the motion trajectory while significantly reducing the amount of data that needs to be transmitted, thus creating conditions for smooth transmission within limited bandwidth.

[0070] Step S403: Spatiotemporally align the predicted trajectory data with the resampled attitude vector sequence to generate a motion trajectory feature vector; The resampled attitude vector sequence reflects only past, discrete observation points, while the predicted trajectory data includes inferences about short-term future motion based on historical behavior. This step aligns these two in time and space. Time alignment involves synchronizing the timestamps of the predicted data with the latest time point of the resampled sequence to ensure they are on a unified timeline. Spatial alignment involves fusing the resampled attitude vector, representing the current exact location, with the predicted trajectory data, representing possible future locations, within a unified screen coordinate system. This fusion results in a motion trajectory feature vector that not only contains current location information but also integrates deeper features such as velocity, acceleration trends, and possible future paths.

[0071] Step S404: Perform nonlinear filtering on the motion trajectory feature vector to generate a smooth trajectory sequence; Even if the user's intention is to move in a straight line, the natural tremor of their hand and the background noise of the sensor will cause the original motion trajectory feature vector to contain high-frequency, small, irregular fluctuations.

[0072] In the embodiments of this application, nonlinear filtering is an effective tool for such problems, such as Kalman filtering or its variants. Unlike simple linear averaging, nonlinear filtering can dynamically adjust the filtering parameters according to the state of motion (such as rapid sliding or fine pointing), and perform optimal estimation of the received trajectory feature vector through a framework that includes a system dynamics model and an observation model. It intelligently determines which changes are the user's true operational intentions and which are random noise, and strongly suppresses the noise while smoothing the true motion trend through interpolation. The smooth trajectory sequence generated after this process can effectively eliminate the jaggedness and slight jumps when the cursor moves, making the cursor trajectory as natural and smooth as if it were passing through a physical inertial system, greatly improving the visual experience and control precision.

[0073] Step S405: Encapsulate the smooth trajectory sequence and user operation event data to generate an optimization instruction data package.

[0074] The smooth trajectory sequence describes how the cursor moves, while user action event data (such as clicks, long presses, and swipes) expresses what the user wants to do. The encapsulation process follows a predefined communication protocol, assigning specific fields and encoding formats to different types of data, such as timestamps, coordinate sequences, and event type codes. For continuous and slowly changing coordinate sequences, differential encoding may be used to reduce the amount of data.

[0075] The final optimized instruction data packet is a refined, smooth set of instructions that includes future prediction information and incorporates the user's operational intentions. It carries the richest control information with the smallest amount of data, ensuring efficient and reliable transmission to the remote receiving end even in fluctuating network environments, and ultimately presenting a precise, smooth, and timely interactive effect on the screen.

[0076] In the above implementation, the data output rate is adaptively adjusted based on network conditions, then artificial intelligence prediction information is integrated to enhance the robustness and foresight of commands, and advanced filtering algorithms are used to purify the motion trajectory, ultimately efficiently encapsulating all information. This series of processes enables the system to maximize the smoothness, accuracy, and response speed of remote cursor control under the constraints of limited, dynamically changing network bandwidth. It transforms network transmission from a passive and unreliable data transfer process into a proactive, intelligent, and quality-controllable process aimed at optimizing user experience, fundamentally improving the practicality and robustness of the remote gesture interaction system.

[0077] Reference Figure 5 As one implementation of step S105, the step of performing hierarchical local feedback based on user operation event data includes: Step S501: The received user operation event data is type-identified to determine the event type; The original user operations collected by the mobile control terminal (such as a mobile phone touch screen) are essentially a collection of time-series coordinate points, pressure values, and touch states (such as pressing, moving, and lifting).

[0078] In this embodiment, type recognition involves using a predefined event recognition logic to parse and categorize these raw, physical data streams into higher-order event types with clear interaction intentions. For example, a quick touch and release might be recognized as a "single"; a touch followed by a release that exceeds a certain time threshold might be recognized as a "long press"; and continuous coordinate movement is recognized as a "swipe." This recognition process goes beyond simple duration or distance judgment; it may involve more complex pattern matching, such as distinguishing whether a "swipe" is a rapid page-turning operation or a slow dragging operation. Through this step, the system standardizes the diverse physical operations of users into a limited number of clearly defined logical events, providing an accurate basis for subsequently triggering different levels of feedback.

[0079] Step S502: Assign a preset set of feedback rules based on the event type, including tactile vibration mode and visual display mode; Each event type is associated with a preset set of feedback rules. This set of rules is a data structure that defines in detail what kind of sensory feedback should be triggered when the event occurs and its specific parameters.

[0080] Specifically, the rule set mainly covers two modes: the haptic vibration mode defines the waveform, intensity, duration, and rhythm of the vibration motor driving the phone (for example, a single tap corresponds to a short, crisp vibration, while a long press to confirm may correspond to a continuous vibration with gradually changing intensity); the visual display mode defines the type, color, size, and duration of the animation displayed on the touch point of the screen (for example, a rapidly expanding concentric circle ripple appears when tapping, and a light trail follows the finger when swiping). The purpose of these preset rules is to ensure the consistency of interaction, that is, the same user operation always evokes the same feedback feeling, thereby enabling users to establish stable psychological expectations and immediately distinguish different operation results through differences in feedback.

[0081] Step S503: Based on the feedback rule set, trigger the corresponding tactile vibration signal and / or visual animation signal locally on the mobile control terminal; When an event type is identified and the corresponding feedback rule set is invoked, the system does not directly operate the hardware, but first generates control signals for the intermediate layer.

[0082] Specifically, for haptic feedback, the system generates a digital haptic vibration signal based on the haptic vibration patterns in the rule set. This signal contains all the parameters required to drive the waveform (such as amplitude, frequency, and duration sequence). For visual feedback, a visual animation signal is generated, which contains the identifiers of the animation resources to be rendered, the starting position, keyframe parameters, etc.

[0083] The key is that this triggering process is completed instantaneously on the mobile control device. It relies entirely on the mobile device's local computing resources and pre-loaded rule base, without involving any network requests or responses from remote servers. This localization is the fundamental guarantee for achieving instant feedback, ensuring that the triggering of feedback is highly synchronized with the user's operation in time, eliminating feedback lag caused by network transmission delays.

[0084] In step S504, a tactile vibration signal is executed by the vibration motor drive unit, and a visual animation signal is rendered by the touch screen display unit.

[0085] The generated tactile vibration signal is sent to the device's hardware abstraction layer, and is finally received and executed by the vibration motor drive unit. This drive unit converts the digital signal into precise current control, driving a linear motor (LRA) or eccentric rotor motor (ERM) to generate mechanical vibrations of a specific frequency and intensity, thereby converting the electronic signal into tactile stimulation that the user's fingers can perceive.

[0086] The generated visual animation signals are submitted to the graphics rendering pipeline of the mobile operating system and rendered by the touch screen display unit. The graphics engine draws the corresponding animation frames in real time at the specified position on the screen (usually the touch point) according to the parameters in the signal (such as water ripple diffusion and color highlighting), and transmits the visual confirmation information to the user through light signals.

[0087] It should be noted that when both are generated simultaneously, the synergy of multimodal feedback is achieved. The superposition effect of tactile and visual feedback can greatly enhance the salience and richness of the feedback, providing users with a three-dimensional and reliable operation confirmation experience.

[0088] In the above embodiments, the confirmation feedback of user operation is decoupled from the network transmission path, so that the mobile control terminal can provide tactile and visual feedback that precisely matches the operation type locally at the same time as sending the operation command to the remote receiving terminal. This completely eliminates the operation delay perceived by the user while waiting for the remote response, and greatly improves the immediacy and certainty of the interaction.

[0089] In practical applications, this technical solution not only ensures smooth interaction when network conditions are poor, but also provides indispensable tactile guidance in scenarios where users cannot look directly at the screen (such as when demonstrating or singing karaoke with their backs to the screen), fundamentally enhancing the reliability and user satisfaction of the remote gesture interaction system.

[0090] Reference Figure 6 As one implementation of step S107, the steps of receiving optimization instruction data packets and session state snapshots at the remote receiving end, parsing and rendering the screen cursor trajectory, and generating an updated session state snapshot based on the cursor trajectory behavior characteristics include: Step S601: Perform protocol parsing on the received optimization instruction data packet to extract the trajectory coordinate sequence and user operation events; Among them, the optimized instruction data packets sent by the mobile control terminal are data structures that are highly encapsulated and compressed to adapt to network transmission. They follow the private or public protocol formats predefined by both parties in the communication.

[0091] In this embodiment, protocol parsing involves reversing the encapsulation process according to the protocol specification to unpack the data packets. The parsing process includes verifying the integrity and correctness of the data packets (e.g., through checksum verification), and then identifying and separating different data segments within the packet. Its core task is to accurately extract two key pieces of information: first, a "trajectory coordinate sequence," which is an array of cursor position coordinates containing timestamp information, describing the expected movement path of the cursor within a short period of time in the future; and second, "user operation events," which are one or more logical instruction codes representing the user's interaction intent, such as "click to confirm" (0x01), "long press menu" (0x02), etc.

[0092] Step S602: Map the trajectory coordinate sequence to the screen coordinate system to generate a real-time cursor position data stream; The parsed trajectory coordinate sequence is usually based on an abstract, normalized coordinate system (e.g., the range is [0,1]), which is to ensure that the sending end does not need to care about the different screen resolutions of the receiving end.

[0093] In this embodiment, the system converts the normalized coordinates into corresponding actual pixel coordinates by scaling calculations based on the specific resolution of the current display device (e.g., 1920x1080 pixels). For example, the normalized coordinates (0.5, 0.5) are mapped to the screen center point (960, 540). Furthermore, the mapping process may include processing of screen boundaries (to prevent the cursor from exceeding the visible area) and possible adjustments to the coordinate axis direction (to ensure that the movement direction of the motion control terminal is consistent with the cursor movement direction on the screen). The resulting real-time cursor position data stream is a continuous sequence of screen pixel positions corresponding to timestamps. The graphics rendering engine can directly use this data stream to update the cursor position in each frame, thereby presenting a smooth cursor movement trajectory on the screen.

[0094] Step S603: Trigger the corresponding interactive response based on the user operation event, execute the operation instruction, and obtain the current interactive response result; The extracted user action event is an abstract logical signal that needs to be converted into a specific interactive action on the receiving end. The system internally maintains an "event-response" mapping table. When a specific event is parsed, a preset interactive response is triggered.

[0095] For example, when the event is a "click," the response might send a "click" message to the currently focused user interface element (such as a button); when the event is a "swipe," the response might send a "page turn" command to the image viewer application or a "scroll" command to the list. This process ensures that remote user actions can precisely control the application on the remote large screen, just like local operations, completing the entire interaction loop from intention issuance to function execution.

[0096] Step S604: Analyze the motion characteristics of the real-time cursor position data stream and calculate the cursor motion state vector; The receiving end does not passively render the cursor position; it continuously analyzes the real-time cursor position data stream. By calculating the cursor displacement and time interval between consecutive frames in real time, the cursor's instantaneous velocity and acceleration can be derived, thus forming a cursor motion state vector.

[0097] Understandably, this vector is a dynamic, high-order feature description; it no longer simply describes "where the cursor is," but rather "how the cursor is moving." For example, by analyzing acceleration, the system can determine whether the user is making precise pointing (low acceleration), rapid swiping (high acceleration), or is stationary. This quantitative analysis of motion features provides data support for understanding the user's operational intentions and predicting the cursor's short-term behavior, enabling the system to handle scenarios such as transmission channel switching more intelligently.

[0098] Step S605: Combine the current interaction response result with the cursor movement state vector to update the received session state snapshot and obtain the updated session state snapshot.

[0099] In order to achieve a seamless experience in scenarios such as channel switching, the system needs to generate updated session state snapshots at key nodes.

[0100] Specifically, the current interaction response result is the system state change after the user operation event is executed (such as which button was pressed or which menu was opened); the cursor motion state vector describes the current movement of the cursor. Combining these two results in an updated session state snapshot.

[0101] Understandably, the session state snapshot contains the exact pixel coordinates of the current cursor, its movement speed and direction, the most recently successfully executed operation event, and its result. This snapshot is more intelligent than simply recording the cursor position because it also includes behavioral characteristics. When a transmission channel needs to be switched, this context-rich snapshot is synchronized to the mobile control terminal. Once the switch is complete, the remote receiving terminal can use the data in this snapshot to not only reset the cursor to the correct position but also understand its previous movement trends, thus achieving true state restoration and interaction continuity, with the user experiencing virtually no interruption.

[0102] In the above implementation, the received optimization instructions are not only accurately converted into smooth cursor trajectories on the screen and precise business function responses, but also endowed with intelligent characteristics of "state awareness" and "behavior understanding". This enables the system to go beyond the role of a passive instruction executor and actively maintain the integrity of the interactive session, providing solid underlying support for coping with network fluctuations and channel switching.

[0103] Reference Figure 7 As a further implementation of the adaptive multi-channel remote posture interaction method, the method also includes: Step S701: Real-time acquisition of the raw data stream of the magnetometer from the mobile control terminal and analysis of magnetic field strength, calculation of the current absolute value of magnetic field strength and the sudden change value of magnetic field strength at adjacent sampling points; The system reads the raw data stream output by the magnetometer sensor in real time, which directly reflects the magnetic field strength vector at the device's location. To quantify the degree of environmental anomaly, the system first calculates the absolute value of the current magnetic field strength, i.e., the magnitude of the magnetic field vector, reflecting the overall strength of the ambient magnetic field. This is used to detect whether the device is in a continuous strong magnetic field environment (e.g., being in close proximity to a magnetized whiteboard). Secondly, it calculates the abrupt change in magnetic field strength between adjacent sampling points. This is typically obtained by calculating the difference between the absolute values ​​of magnetic field strength between two consecutive sampling points, used to detect instantaneous, severe magnetic field interference (e.g., a mobile phone rapidly approaching a piece of metal). Through parallel calculations of these two characteristic quantities, the system can comprehensively capture magnetic field anomalies from both "static strength" and "dynamic rate of change," providing accurate and quantifiable input for subsequent failure assessment.

[0104] Step S702: When the absolute value of the magnetic field strength exceeds the first preset threshold or the sudden value of the magnetic field strength exceeds the second preset threshold, a magnetometer failure signal is generated. The calculated continuous characteristic quantities are binarized and judged by preset, empirical thresholds (such as an absolute value of magnetic field strength of 200 microtesla or a sudden change value of 100 microtesla). As long as either the static strength exceeds the standard or the dynamic change is too large, the current magnetometer data can be determined to be unreliable, thereby generating a clear magnetometer failure signal.

[0105] It should be noted that the first preset threshold (for absolute values) is used to prevent continuous and stable strong magnetic field interference, under which the azimuth information measured by the magnetometer will be completely distorted; the second preset threshold (for abrupt values) is used to deal with transient and pulsed magnetic field interference, which will cause the magnetometer reading to jump drastically, thereby causing instantaneous divergence in the attitude calculation results.

[0106] Step S703: Obtain the current working mode identifier; Step S704: Determine if the current working mode is nine-axis mode; if yes, proceed to step S705; otherwise, do not perform any operation. The system does not immediately downgrade upon detecting an abnormal magnetic field; instead, it first obtains the current operating mode identifier. This is because the downgrade operation (from nine-axis mode to six-axis mode) is only meaningful when the system is currently in "nine-axis mode," which relies on the magnetometer. If the system is already operating in "six-axis mode" or "three-axis mode" due to hardware deficiencies or other reasons, the magnetometer failure is already a fait accompli that does not affect system operation, and there is no need to trigger a downgrade again.

[0107] Step S705: Trigger the sensor degradation command to update the current working mode identifier to the six-axis mode identifier; Specifically, the system will only formally "trigger the sensor degradation command" if both of the following preconditions are met simultaneously: "currently in nine-axis mode" and "magnetometer has failed." This command is a control signal whose core action is to update the current operating mode identifier to a six-axis mode identifier. This update represents a fundamental switch in the system's operating mode, meaning that from this moment on, all subsequent attitude calculation algorithms will no longer attempt to fuse magnetometer data to obtain absolute orientation (yaw angle), but will instead rely solely on accelerometers and gyroscopes to calculate relative attitude.

[0108] Understandably, this step enables a strategy shift from a high-precision mode that relies on a magnetometer and has an absolute direction reference to a robust mode that does not rely on a magnetometer, is resistant to magnetic field interference, but may be subject to gyroscope drift.

[0109] Step S706: In response to the current working mode identifier being updated to a six-axis mode identifier, the gyroscope drift compensation process is triggered. The gyroscope calculates angle changes by integrating angular velocity, but its inherent zero-point drift and temperature drift cause integration errors to accumulate over time, resulting in the calculated attitude gradually deviating from the true value, manifested as the cursor slowly and automatically moving on the screen. Therefore, in response to the update of the operating mode indicator, the system immediately executes the gyroscope drift compensation process.

[0110] Step S707: Calculate the projection of the gravity vector in the device coordinate system based on the accelerometer data, and generate the pitch angle reference value and roll angle reference value; The system utilizes the ubiquitous gravitational field on Earth as an absolute physical reference to compensate for the gyroscope integration drift that would inevitably occur due to the loss of the magnetometer reference. In six-axis mode, the system loses the magnetometer's absolute correction capability for the heading angle (yaw angle), but it can still make full use of the accelerometer. The accelerometer senses the direction of gravity, providing the device with an absolute reference for pitch and roll angles. The tilt angle of the device relative to the vertical line of gravity is calculated mathematically from the accelerometer data.

[0111] Step S708: Input the pitch angle reference value, roll angle reference value and gyroscope angular velocity data into the Kalman filter to dynamically correct the integral result of the gyroscope angular velocity. Among them, accelerometer data is contaminated by linear acceleration when the device is in motion and cannot be used directly. Therefore, it is necessary to introduce the "Kalman filter" as an optimal estimator.

[0112] In this embodiment, the core idea of ​​the Kalman filter is to combine the advantages of two sensors: it trusts the gyroscope's high accuracy and dynamic response in measuring angular velocity over a short time, predicting attitude changes through integration; however, it does not completely trust the gyroscope because its error accumulates with integration (i.e., drift). The filter compares the attitude predicted by the gyroscope's integration with a "reference value" calculated from the accelerometer, which contains noise but has no long-term drift. Through gain calculation, the filter intelligently corrects the gyroscope's prediction, outputting an optimal estimate. This value maintains the dynamic smoothness of the gyroscope while being continuously pulled back to the correct path by the absolute reference of the accelerometer, thereby effectively suppressing pitch and roll drift.

[0113] Step S709: Output the standardized attitude vector corrected by the Kalman filter to the trajectory prediction model.

[0114] The attitude data (whether in quaternion or Euler angle form) obtained after Kalman filter correction has been optimized to suppress drift to the greatest extent. This step outputs this "corrected normalized attitude vector".

[0115] It's important to note that the data format and interface specifications of the output vector in this step are completely consistent with those in nine-axis mode. This means that downstream trajectory prediction models and all other processing modules do not need to concern themselves with whether the upstream is running in nine-axis or six-axis mode. They consistently receive a well-defined, reliable, and standardized attitude vector. This design allows the core attitude awareness capability to smoothly degrade without the user's awareness when faced with hardware deficiencies or environmental interference, minimizing performance loss.

[0116] In the above implementation, real-time monitoring of magnetic field environment characteristics and failure judgment based on dual thresholds enable rapid and accurate perception of abnormal magnetometer states. By jointly judging the failure state with the current operating mode, the rigor and context relevance of the degradation decision are ensured. Finally, by triggering the update of the operating mode identifier and starting the gyroscope drift compensation algorithm based on Kalman filtering, not only is a seamless and fault-tolerant degradation from nine-axis mode to six-axis mode achieved, but also the inherent gyroscope drift problem after degradation is effectively compensated through an advanced sensor fusion algorithm.

[0117] In practical applications, this technical solution ensures that users can still obtain a continuous, stable, and usable remote control experience in various complex real-world scenarios, such as when near a magnetized whiteboard or metal furniture. By minimizing the impact of local sensor failure on the overall user experience, it greatly enhances the robustness and reliability of the system in real and complex environments.

[0118] Reference Figure 8As a further implementation of the adaptive multi-channel remote attitude interaction method, after step S108 of storing the updated session state snapshot to the trajectory tracking database of the mobile control terminal, the method further includes: Step S801: Extract cursor trajectory behavior features from the updated session state snapshot from the trajectory tracking database; The updated session state snapshots recorded in the trajectory tracking database contain a sequence of cursor coordinates changing over time in the screen coordinate system.

[0119] In this embodiment, a series of behavioral features characterizing motion dynamics are calculated by analyzing continuous coordinate data over a short period (e.g., the most recent second). These features typically include kinematic parameters such as the cursor's instantaneous velocity, acceleration, and rate of change of motion direction. For example, the instantaneous velocity is obtained by calculating the ratio of the cursor's displacement difference between consecutive frames to the time interval; the acceleration is obtained by further differentiating the velocity sequence.

[0120] Understandably, these calculated feature values ​​collectively constitute a multi-dimensional behavioral feature vector. This vector is no longer merely a record of position, but rather a profound description of how the cursor moves—for example, whether it moves rapidly, smoothly, linearly, or curvilinearly. This step elevates low-order coordinate data to a high-order behavioral description, providing information input for subsequent intelligent classification.

[0121] Step S802: Classify motion patterns based on cursor trajectory behavior characteristics and generate behavior feature labels; Among them, behavioral feature tags include rapid swiping, fine-tuning, and stable hovering; Specifically, through predefined classification rules or lightweight classification models, continuous feature vectors are mapped to discrete, semantically defined "behavioral feature labels." The system sets a series of judgment thresholds or decision boundaries based on physical motion characteristics. For example, when the average speed is consistently higher than a high threshold (e.g., 300 pixels / second) and the acceleration variance is large, the system determines the current user operation as "rapid swiping," which typically corresponds to page turning or quick browsing. When the average speed is extremely low (e.g., below 50 pixels / second) and the change in motion direction is minimal, it is determined as "fine-tuning," corresponding to scenarios such as precisely clicking buttons or selecting small targets. When the speed is nearly zero and remains so for a period of time, it is determined as "stable hovering," which may mean the user is reading menu content. The essence of this classification is to explicitly categorize fuzzy, continuous user actions into several typical, programmable interaction patterns.

[0122] Step S803: In response to the type of behavioral feature label, dynamically adjust the configuration parameters of the trajectory prediction model: Specifically, if the behavioral feature label is fast sliding, the time window range of the trajectory prediction model is expanded; if the behavioral feature label is fine-tuning, the depth of the neural network layer of the trajectory prediction model is increased. It is understandable that different operating modes have inherent contradictions in their requirements for the prediction model, and the best results cannot be obtained with a fixed set of parameters. Therefore, the trajectory prediction model needs to be dynamically adjusted.

[0123] For example, when the "rapid swipe" label is detected, it indicates that the user is performing a large, high-speed operation. Such operations have significant inertia, and their short-term trajectories exhibit strong trends. In this case, expanding the "prediction time window" (e.g., from 100ms to 150ms) means that the model will refer to historical data over a longer period when making predictions. This allows the model to better capture and continue the current rapid swipe trend, predicting long-term trajectories that better conform to the laws of physical inertia, and avoiding the phenomenon of predicted trajectories being too short or prematurely "braking" during high-speed motion.

[0124] Conversely, when the "fine-tuning" label is identified, the user's operation is small and may include high-frequency noise such as physiological hand tremors, while requiring extremely high accuracy. In this case, increasing the number of layers in the LSTM network (e.g., from 2 to 3 layers) aims to improve the model's non-linear expressive power and feature abstraction level. Deeper networks can better extract the true user intent from subtle and complex motion sequences and effectively filter out random jitter, thus outputting smoother and more accurate predicted trajectories, which is crucial for achieving pixel-level precise localization.

[0125] Step S804: Redeploy the trajectory prediction model with updated configuration parameters; The determined parameter adjustment scheme is hot-updated into the running prediction model, and the optimized model is immediately used to improve the current interactive experience. "Redeployment" is a dynamic loading process. It does not retrain the model, but replaces or resets the model's key operating parameters (such as time window length and network structure depth) in memory. This process should be efficient enough to maintain the real-time nature of the interaction.

[0126] Step S805: Regenerate the predicted trajectory data based on the redeployed trajectory prediction model and replace the original predicted trajectory data.

[0127] The system utilizes a trajectory prediction model with adjusted parameters to process the standardized attitude vector of new inputs. Since the model parameters have been optimized based on the current operating mode, the regenerated predicted trajectory data will better match the user's actual behavioral intentions; for example, it will be more forward-looking during rapid swipes and more robust and accurate during fine-tuning.

[0128] In the above implementation, the model is no longer a static algorithm but an intelligent agent capable of understanding the user's operational intentions (whether it's quick browsing or precise positioning) and self-optimizing. Ultimately, this technical solution enables the remote gesture interaction system to provide smoother trajectory predictions that better match inertial expectations in high-speed sliding scenarios, while providing more stable guidance that suppresses jitter and improves positioning accuracy in fine-operation scenarios. This allows it to provide optimal prediction performance in various differentiated usage scenarios, significantly improving the accuracy of remote control and the overall intelligence level of the user experience.

[0129] This application also discloses an adaptive multi-channel remote posture interaction system.

[0130] An adaptive multi-channel remote posture interaction system, specifically comprising: The working mode decision module is used to obtain the sensor configuration information of the mobile control terminal and generate a working mode identifier based on the availability status of the gyroscope, accelerometer and magnetometer. The attitude calculation module is used to collect the raw sensor data stream from the mobile control terminal, perform attitude calculation based on the working mode identifier, and generate a standardized attitude vector. The trajectory prediction module is used to train a trajectory prediction model based on historical attitude vector sequences. Standardized attitude vectors are input into the trajectory prediction model to generate predicted trajectory data. The optimized instruction data generation module is used to obtain the bandwidth characteristics of the current transmission channel, combine the predicted trajectory data, perform frame rate adaptation and motion smoothing on the standardized attitude vector, and generate optimized instruction data packets. The local feedback separation and transmission module acquires user operation event data, sends the optimization instruction data packet to the remote receiving end through the current transmission channel, and performs hierarchical local feedback based on the user operation event data. The session state snapshot module monitors the network quality indicators of the current transmission channel. When the network packet loss rate exceeds the preset packet loss rate threshold, it generates a session state snapshot containing the cursor position and event sequence and synchronizes it to the backup transmission channel. The channel hot-switching module is used to update the current transmission channel to a backup transmission channel; The remote rendering and behavior analysis module is used to receive optimization instruction data packets and session state snapshots at the remote receiving end, parse them, render the screen cursor trajectory, and generate updated session state snapshots based on the cursor trajectory behavior characteristics. The trajectory tracking database module is used to store updated session state snapshots to the trajectory tracking database of the mobile control terminal for state recovery during subsequent transmission channel switching.

[0131] An adaptive multi-channel remote posture interaction system according to an embodiment of this application can implement any of the above methods, and the specific working process of each module in the system can refer to the corresponding process in the above method embodiments.

[0132] In the several embodiments provided in this application, it should be understood that the provided methods and systems can be implemented in other ways. For example, the system embodiments described above are merely illustrative; for example, the division of a certain module is merely a logical functional division, and in actual implementation there may be other division methods, such as multiple modules can be combined or integrated into another system, or some features can be ignored or not executed.

[0133] This application also discloses a computer-readable storage medium.

[0134] A computer-readable storage medium storing a computer program that can be loaded by a processor and executed as described above in any of the adaptive multi-channel remote gesture interaction methods.

[0135] The computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in connection with an instruction execution system, apparatus, or device; the program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0136] The above are all preferred embodiments of this application and are not intended to limit the scope of protection of this application. Any feature disclosed in this specification (including the abstract and drawings) may be replaced by other equivalent or similar features unless specifically stated otherwise. That is, unless specifically stated otherwise, each feature is only one example of a series of equivalent or similar features.

Claims

1. An adaptive multi-channel remote gesture interaction method, characterized in that, The method includes: Obtain sensor configuration information from the mobile control terminal and generate a working mode identifier based on the availability status of the gyroscope, accelerometer, and magnetometer; The system collects raw sensor data streams from the mobile control terminal, performs attitude calculations based on the working mode identifier, and generates standardized attitude vectors. A trajectory prediction model is trained based on a historical attitude vector sequence. The standardized attitude vector is then input into the trajectory prediction model to generate predicted trajectory data. Obtain the bandwidth characteristics of the current transmission channel, combine them with the predicted trajectory data, perform frame rate adaptation and motion smoothing on the standardized attitude vector, and generate an optimized instruction data packet; Acquire user operation event data, send the optimization instruction data packet to the remote receiving end through the current transmission channel, and simultaneously perform hierarchical local feedback based on the user operation event data; Monitor the network quality indicators of the current transmission channel. When the network packet loss rate exceeds the preset packet loss rate threshold, generate a session state snapshot containing the cursor position and event sequence and synchronize it to the backup transmission channel, and update the current transmission channel to the backup transmission channel. The optimized instruction data packet and session state snapshot are received at the remote receiving end, parsed and rendered to show the screen cursor trajectory, and an updated session state snapshot is generated based on the cursor trajectory behavior characteristics. The updated session state snapshot is stored in the trajectory tracking database of the mobile control terminal for state recovery during subsequent transmission channel switching.

2. The adaptive multi-channel remote posture interaction method according to claim 1, characterized in that, The steps of acquiring the raw sensor data stream from the mobile control terminal, performing attitude calculation based on the working mode identifier, and generating a standardized attitude vector include: The three-axis angular velocity data, three-axis acceleration data, and three-axis magnetic field data of the mobile control terminal are collected, and timestamp alignment and unit normalization are performed respectively to generate a synchronous sensor data stream; Based on the working mode identifier, an attitude calculation strategy is assigned, and an attitude quaternion is generated; The attitude quaternion is transformed to map the attitude quaternion in the device coordinate system to the screen reference coordinate system. Extract the attitude angle vectors after coordinate system transformation, including pitch angle, roll angle and yaw angle, and generate a standardized attitude vector.

3. The adaptive multi-channel remote posture interaction method according to claim 2, characterized in that, The steps for training a trajectory prediction model based on historical attitude vector sequences and inputting the standardized attitude vectors into the trajectory prediction model to generate predicted trajectory data include: Load historical attitude vector sequences from a pre-stored trajectory tracking database and construct a training dataset by sampling according to time windows; Initialize the gated recurrent neural network model structure and use the training dataset as input for time series feature extraction; The network weight parameters are optimized using the gradient descent algorithm to generate a trajectory prediction model. The standardized attitude vectors acquired in real time are input into the trajectory prediction model, which outputs the attitude coordinate prediction values ​​for future time windows. By integrating the current standardized attitude vectors and attitude coordinate predictions, predicted trajectory data containing temporal relationships is generated.

4. The adaptive multi-channel remote posture interaction method according to claim 3, characterized in that, The steps of obtaining the bandwidth characteristics of the current transmission channel, combining the predicted trajectory data, performing frame rate adaptation and motion smoothing on the standardized attitude vector, and generating an optimized instruction data packet include: Analyze the bandwidth characteristics of the current transmission channel to determine the target frame rate threshold; Based on the target frame rate threshold, the standardized pose vector sequence is resampled in the time dimension; The predicted trajectory data is spatiotemporally aligned with the resampled attitude vector sequence to generate a motion trajectory feature vector. The motion trajectory feature vector is subjected to nonlinear filtering to generate a smooth trajectory sequence; The smooth trajectory sequence and user operation event data are encapsulated to generate an optimized instruction data package.

5. The adaptive multi-channel remote posture interaction method according to claim 1, characterized in that, The steps for performing hierarchical local feedback based on the user action event data include: The received user action event data is type-identified to determine the event type; A preset set of feedback rules, including tactile vibration mode and visual display mode, is assigned according to the event type. Based on the feedback rule set, the corresponding tactile vibration signal and / or visual animation signal are locally triggered on the mobile control terminal. The tactile vibration signal is executed by a vibration motor drive unit, and the visual animation signal is rendered by a touch screen display unit.

6. The adaptive multi-channel remote posture interaction method according to claim 1, characterized in that, The steps of receiving the optimization instruction data packet and session state snapshot at the remote receiving end, parsing and rendering the screen cursor trajectory, and generating an updated session state snapshot based on the cursor trajectory behavior characteristics include: The received optimization instruction data packets are parsed to extract the trajectory coordinate sequence and user operation events; The trajectory coordinate sequence is mapped to the screen coordinate system to generate a real-time cursor position data stream; Based on user operation events, corresponding interactive responses are triggered, operation instructions are executed, and the current interactive response result is obtained; Analyze the motion characteristics of the real-time cursor position data stream and calculate the cursor motion state vector; By combining the current interaction response result with the cursor movement state vector, the received session state snapshot is updated to obtain an updated session state snapshot.

7. The adaptive multi-channel remote posture interaction method according to claim 1, characterized in that, The method further includes: The system acquires the raw data stream from the magnetometer on the mobile control terminal in real time and performs magnetic field strength analysis, calculating the absolute value of the current magnetic field strength and the abrupt change value of the magnetic field strength at adjacent sampling points. When the absolute value of the magnetic field strength exceeds a first preset threshold or the sudden change value of the magnetic field strength exceeds a second preset threshold, a magnetometer failure signal is generated. Obtain the current working mode identifier and determine whether the current working mode is nine-axis mode; If so, a sensor degradation command is triggered to update the current working mode identifier to a six-axis mode identifier; In response to the current working mode identifier being updated to a six-axis mode identifier, the gyroscope drift compensation process is triggered. Calculate the projection of the gravity vector in the device coordinate system based on accelerometer data, and generate pitch angle reference values ​​and roll angle reference values; The pitch angle reference value, roll angle reference value and gyroscope angular velocity data are input into the Kalman filter to dynamically correct the integral result of the gyroscope angular velocity. The normalized attitude vector, corrected by a Kalman filter, is output to the trajectory prediction model.

8. An adaptive multi-channel remote posture interaction method according to any one of claims 1 to 7, characterized in that, After the step of storing the updated session state snapshot to the trajectory tracking database of the mobile control terminal, the method further includes: Extract cursor trajectory behavior features from the updated session state snapshot from the trajectory tracking database; Based on the cursor trajectory behavior characteristics, motion patterns are classified to generate behavior feature labels; wherein, the behavior feature labels include rapid sliding, fine-tuning, and stable hovering; In response to the type of the behavioral feature label, the configuration parameters of the trajectory prediction model are dynamically adjusted: if the behavioral feature label is fast sliding, the time window range of the trajectory prediction model is expanded; if the behavioral feature label is fine-tuning, the neural network layer depth of the trajectory prediction model is increased. Redeploy the trajectory prediction model with updated configuration parameters; The predicted trajectory data is regenerated based on the redeployed trajectory prediction model and replaces the original predicted trajectory data.

9. An adaptive multi-channel remote posture interaction system, characterized in that, The system includes: The working mode decision module is used to obtain the sensor configuration information of the mobile control terminal and generate a working mode identifier based on the availability status of the gyroscope, accelerometer and magnetometer. The attitude calculation module is used to collect the raw sensor data stream of the mobile control terminal, perform attitude calculation according to the working mode identifier, and generate a standardized attitude vector. The trajectory prediction module is used to train a trajectory prediction model based on historical attitude vector sequences, and input the standardized attitude vectors into the trajectory prediction model to generate predicted trajectory data. The optimized instruction data generation module is used to obtain the bandwidth characteristics of the current transmission channel, combine the predicted trajectory data, perform frame rate adaptation and motion smoothing processing on the standardized attitude vector, and generate an optimized instruction data packet. The local feedback separation and transmission module acquires user operation event data, sends the optimization instruction data packet to the remote receiving end through the current transmission channel, and performs hierarchical local feedback based on the user operation event data. The session state snapshot module monitors the network quality indicators of the current transmission channel. When the network packet loss rate exceeds a preset packet loss rate threshold, it generates a session state snapshot containing the cursor position and event sequence and synchronizes it to the backup transmission channel. A channel hot-switching module is used to update the current transmission channel to the backup transmission channel; The remote rendering and behavior analysis module is used to receive the optimization instruction data packet and session state snapshot at the remote receiving end, parse and render the screen cursor trajectory, and generate an updated session state snapshot based on the cursor trajectory behavior characteristics. The trajectory tracking database module is used to store the updated session state snapshot to the trajectory tracking database of the mobile control terminal for state recovery during subsequent transmission channel switching.

10. A computer-readable storage medium, characterized in that: The computer program is stored that can be loaded by a processor and executed as described in any one of claims 1 to 8.