Rpa mouse keyboard control method and system for wayland desktop
By registering virtual input devices in the Wayland environment and using local Unix domain sockets for communication, the limitations of the Wayland security model are resolved. This enables high-precision, programmable, and cross-environment mouse and keyboard event injection, suitable for various desktop environments, and possesses good versatility and security.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 浙江实在智能科技有限公司
- Filing Date
- 2026-03-18
- Publication Date
- 2026-06-30
AI Technical Summary
The existing Wayland security model limits the ability of traditional automation tools to directly inject input events, making it difficult to achieve high-precision, programmable, cross-environment mouse and keyboard event injection in the Wayland environment.
By registering virtual input devices in user space and utilizing the kernel event system, combined with local Unix domain socket communication, the system enables simulated control of mouse and keyboard events. It adopts a structured instruction protocol in JSON format, supports absolute coordinates and key combinations, and combines system permission management to ensure security and flexibility.
It achieves high-precision, cross-environment mouse and keyboard event injection in the Wayland environment, supports absolute coordinates and key combination operations, is suitable for various desktop environments, has good versatility and security, adapts to containerized deployment, and avoids network exposure and permission risks.
Smart Images

Figure CN121879894B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer human-computer interaction and automation control technology, specifically relating to an RPA mouse and keyboard control method and system for Wayland desktop. Background Technology
[0002] In Linux desktop systems, the X protocol (i.e., X Window System, often referred to as X11 or X) and Wayland are two core display server protocols used to manage graphics displays. The X protocol has a flexible architecture and supports network transparency, but its long-standing complexity and security issues have spurred the development of a new generation of protocols.
[0003] With the development of desktop automation (RPA) technology, more and more application scenarios require simulating user input operations in a graphical interface, such as mouse clicks, keyboard input, and window switching. In traditional X11 graphics systems, tools such as xdotool and autokey can call the graphics server interface via the X protocol to achieve simulated input control in the desktop environment.
[0004] Wayland is a lightweight display server protocol designed for modern graphics environments, aiming to simplify the graphics stack and improve performance and security. It offloads the responsibilities of a traditional display server to the client and compositor, reducing intermediate steps, thereby lowering latency and increasing efficiency. Today, Wayland has become the default choice for most mainstream Linux desktop environments, such as GNOME and KDEPlasma.
[0005] However, in recent years, Linux desktop systems have gradually migrated to the Wayland protocol. Wayland, with security as its core design principle, restricts the ability of external programs to directly inject input events. Because Wayland clients cannot access each other's input and drawing contexts, existing automation tools that rely on X11 cannot function properly in pure Wayland sessions, making it difficult to continue supporting the input control requirements of RPA.
[0006] While Wayland's security model improves overall system security, it also renders traditional automated methods based on inter-process event snooping / injection ineffective, creating a technical contradiction between "security and automation".
[0007] Currently, the main representative methods for implementing graphical desktop input control include the following:
[0008] 1. Input injection tool based on the X11 protocol
[0009] Common tools such as xdotool, xte, and autokey utilize the APIs provided by the X Server to simulate mouse and keyboard events. These methods rely on the traditional X11 protocol and are only available under the X Server; they have gradually become obsolete as modern mainstream desktop systems transition to the Wayland protocol.
[0010] 2. Based on the custom interface provided by the window manager
[0011] Some Wayland-compatible window managers (such as Sway and KWin) provide additional control interfaces (such as swaymsg and KWin scripts) to support partial automation. However, these solutions are heavily dependent on specific desktop environments and lack versatility; moreover, their exposed interface capabilities are limited, making it difficult to meet the needs of complex automation processes, such as key combination simulation and coordinate-level mouse control.
[0012] 3. Bypassing based on compositor plugins or security policies
[0013] Some advanced users have attempted to achieve input injection by modifying Wayland compositor or using security policy bypass mechanisms (such as enabling special permissions, SELinux adjustments, etc.), but such approaches require deep intervention in the underlying graphics system, which poses security risks, is complex to implement, and has poor compatibility, making it difficult to promote to standard production environments or distributions.
[0014] 4. Start the legacy compatibility service via XWayland
[0015] Another approach involves enabling the XWayland service within Wayland and then using tools like xdotool to operate it indirectly. However, this method's operation window is limited to the XWayland sub-environment, preventing access to the native Wayland application, and it increases system resource consumption and operational complexity.
[0016] 5. Limitations of ydotool
[0017] Although ydotool can simulate input in the Wayland environment through uinput, it is only a command-line tool, lacks a service architecture, does not support remote calls and multi-client communication, and has no unified control protocol, making it difficult to integrate into automation systems. In addition, it lacks permission management and execution feedback mechanisms, resulting in insufficient versatility and security.
[0018] Therefore, it is crucial to design a desktop input control method that does not rely on the graphics protocol layer and works in conjunction with the kernel virtual input device and the local secure communication channel, so as to achieve high-precision, programmable, and cross-environment mouse and keyboard event injection while maintaining the Wayland security model. Summary of the Invention
[0019] This invention aims to overcome the technical contradiction of "security vs. automation" in the existing technology, where the existing Wayland security model improves the overall system security but also causes traditional automation methods based on inter-process event snooping / injection to fail. The invention provides an RPA mouse and keyboard control method and system for Wayland desktop that can achieve high-precision, programmable, and cross-environment mouse and keyboard event injection while maintaining the Wayland security model.
[0020] To achieve the above-mentioned objectives, the present invention adopts the following technical solution:
[0021] The RPA mouse and keyboard control method for Wayland desktop includes the following steps;
[0022] S1, Initialize the virtual input device:
[0023] In user space, a virtual input device supporting mouse and keyboard functions is created and registered by accessing the virtual input device interface provided by the operating system kernel; the virtual input device is configured as a touchpad type and reports standard input events to the desktop system through the kernel event system.
[0024] S2, Get screen resolution:
[0025] Detect the current active display output and obtain screen resolution information;
[0026] S3, build a local communication interface:
[0027] The server program listens on a preset local Unix domain socket path to receive control commands from clients;
[0028] S4 defines and parses the instruction protocol:
[0029] The client sends a control request in a structured format to the local Unix domain socket path, and the server receives and parses the control request.
[0030] S5, execute the instruction and simulate input:
[0031] The server maps the parsed control requests to the corresponding kernel input event sequence according to the operation type; at the same time, it injects the input events into the system by writing event data to the virtual input device to simulate control operations on the graphical desktop.
[0032] Preferably, in step S2, the method for obtaining screen resolution information includes:
[0033] The current framebuffer information can be obtained by parsing the sysfs file system, calling the udev library interface, accessing the logind D-Bus interface, or calling the libdrm / gbm library.
[0034] Preferably, in step S3, the access permissions of the local Unix domain socket file are set to allow only specific users or user groups to read and write.
[0035] Preferably, in step S4, the control request includes at least mouse movement, mouse click, key input, and combination key operation instructions.
[0036] Preferably, in step S4, the structured format is JSON format; the mouse movement command includes absolute coordinate parameters, and the server converts the absolute coordinates into the corresponding ABS_X absolute position event value and ABS_Y absolute position event value according to the obtained screen resolution.
[0037] Preferably, in step S5, during the simulated control operation of the graphical desktop, for combination key operations, a preset delay is inserted between the execution of each key event; the preset delay is 50 milliseconds.
[0038] As a preferred option, a permissions management process is also included, as follows:
[0039] You can grant access permissions for virtual input devices to specific user groups by configuring udev rules, or you can use the system daemon systemd's user service unit to perform sandboxed permission control on server programs.
[0040] Preferably, the server program runs in a background persistent mode and supports multi-client access management as well as an extended command interface based on JSON-RPC style.
[0041] This invention also provides an RPA mouse and keyboard control system for Wayland desktops, comprising:
[0042] The virtual input device initialization module is used to create and register a virtual input device that supports mouse and keyboard functions in user space by accessing the virtual input device interface provided by the operating system kernel; the virtual input device is configured as a touchpad type and reports standard input events to the desktop system through the kernel event system;
[0043] The screen resolution acquisition module is used to detect the currently active display output and acquire screen resolution information;
[0044] The local communication interface building module is used to enable the server program to listen on a preset local Unix domain socket path to receive control commands from the client.
[0045] The instruction protocol definition and parsing module is used to enable the client to send control requests to the local Unix domain socket path in a structured format, and the server to receive and parse the control requests;
[0046] The system executes instructions and simulates an input module, enabling the server to map parsed control requests to corresponding kernel input event sequences based on the operation type. Simultaneously, it injects input events into the system by writing event data to the virtual input device to simulate control operations on the graphical desktop.
[0047] Compared with existing technologies, the advantages of this invention are: (1) Breaking through the limitations of the Wayland security model and realizing realistic input simulation: This invention registers virtual devices through the underlying kernel interface uinput, bypassing the Wayland desktop's restrictions on input injection. It can realize high-fidelity mouse and keyboard event simulation without modifying the graphics protocol stack, and is applicable to all desktop environments using the Wayland protocol; (2) The communication protocol structure is clear and suitable for automated system integration: It adopts the structured JSON protocol and the local Unix Socket communication method. The instruction format is unified and the parsing is simple, which is convenient for various RPAs. (2) It can be embedded in systems, scripting languages or remote control programs, and has good cross-language and cross-platform integration capabilities; (3) It does not depend on specific desktop components and has strong versatility: This invention does not depend on any specific window manager or compositor's private extension interface (such as swaymsg or KWin script), and has extremely strong versatility and portability, and is suitable for various distributions, desktop environments and deployment forms; (4) It has high precision and fine control granularity: The technical solution of this invention supports mouse movement with absolute and relative coordinates, input of arbitrary key values, combination key press and release, scroll wheel scroll and other fine-grained control operations, which meet the needs of complex automated testing, interactive simulation, input replay and other requirements; (5) It is flexible in deployment and has controllable security: The server only listens to local socket files and does not need to open network ports; it can be combined with system permission mechanisms (such as group permissions, systemd service) (6) Strong adaptability, supporting container and sandbox running environments: The solution of this invention can run in containerized or sandboxed environments such as Docker, Flatpak, and Snap. It can be executed as long as it has uinput permission, without being restricted by the desktop running architecture, which facilitates distribution and isolated deployment; (7) This invention provides a general, efficient and scalable desktop input control mechanism, which is particularly suitable for realizing automated interactive control in the Wayland protocol environment, and provides key underlying support for RPA tools and desktop automation systems. Attached Figure Description
[0048] Figure 1This is a schematic diagram of an RPA mouse and keyboard control method for Wayland desktop in this invention;
[0049] Figure 2 This is a schematic diagram illustrating the principle of constructing a local communication interface in this invention. Detailed Implementation
[0050] To more clearly illustrate the embodiments of the present invention, specific implementation methods will be described below with reference to the accompanying drawings. Obviously, the drawings described below are merely some embodiments of the present invention. For those skilled in the art, other drawings and other implementation methods can be obtained based on these drawings without any creative effort.
[0051] like Figure 1 As shown, this invention provides an RPA mouse and keyboard control method for the Wayland desktop. Its core lies in registering a virtual input device using the uinput interface provided by the Linux kernel, and receiving structured control commands sent by the client through a local Unix domain socket, thereby achieving simulated control of the mouse and keyboard in the Wayland desktop environment. Specifically, the steps are as follows:
[0052] 1. Initialize the virtual input device:
[0053] In user space, a virtual input device supporting mouse and keyboard functions is created by accessing the / dev / uinput device file (virtual input device interface). As an RPA driver, this device reports standard input events (such as EV_KEY, EV_REL, EV_ABS) to the desktop system through the kernel event system.
[0054] Given the specific nature of the Wayland protocol, the mouse driver needs to be registered as a touchpad type. When moving the mouse, it's necessary to send ABS_X and ABS_Y absolute position events to handle absolute position movement. Since absolute position movement requires obtaining the current screen resolution, before creating the touchpad driver, it's also necessary to detect the active display output (activeoutput) or DRM / KMS status and obtain the screen resolution. This can be done through sysfs ( / sys / class / drm / * / modes), udev, or the logged-ind D-Bus interface, or by calling libdrm / gbm to obtain the current framebuffer information.
[0055] 2. Build a local communication interface:
[0056] like Figure 2As shown, the server program (RPA driver) listens on a local Unix domain socket path (e.g., / tmp / rpa_input.sock, i.e., the sock communication file) to receive data from the RPA control terminal. Figure 2 This refers to control commands from RPA software 1 to RPA software 3 or the automation engine. This communication method requires no network permissions, is secure and reliable, and offers flexible deployment.
[0057] 3. Define a unified command protocol:
[0058] The client sends structured control requests in JSON format, including but not limited to the following operation instructions:
[0059] Mouse movement: {"action": "mouseMove", "x": 100, "y": 200};
[0060] Mouse click: {"action": "mouseClick", "button": "left"};
[0061] Key input: {"action": "keyPress", "key": "ENTER"};
[0062] Key combination: {"action": "keyCombo", "keys": ["CTRL", "ALT", "T"]}.
[0063] 4. Parse and execute instructions:
[0064] The server parses the received JSON instructions, maps them to corresponding input events based on the operation type, and injects them into the system by writing them to the uinput device file, thus simulating the operation of the graphical desktop. When executing encapsulated combined events, since the kernel call speed is faster than the interface display, a delay of about 50ms needs to be added to prevent the event from being overwritten before it is rendered.
[0065] 5. Compatibility and security design:
[0066] The proposed solution operates without relying on the graphics protocol layer, is compatible with mainstream Wayland desktop environments (such as GNOME, KDE Plasma, Sway, etc.), and ensures the security and controllability of input simulation operations through system permission control and access restriction.
[0067] 6. Access Control Mechanism
[0068] You can use udev rules to grant access permissions for / dev / uinput to specific user groups (such as input or rpa).
[0069] Unix socket files can be set with chmod 660 and assigned to a specific group;
[0070] It can be combined with the systemd-user service unit for sandbox control;
[0071] Optionally, the service program also supports:
[0072] Background persistent mode;
[0073] Event feedback or execution feedback mechanism;
[0074] Multi-client access management;
[0075] JSON-RPC style extended command interface.
[0076] In addition, the present invention also provides an RPA mouse and keyboard control system for Wayland desktops, comprising:
[0077] The virtual input device initialization module is used to create and register a virtual input device that supports mouse and keyboard functions in user space by accessing the virtual input device interface provided by the operating system kernel; the virtual input device is configured as a touchpad type and reports standard input events to the desktop system through the kernel event system;
[0078] The screen resolution acquisition module is used to detect the currently active display output and acquire screen resolution information;
[0079] The local communication interface building module is used to enable the server program to listen on a preset local Unix domain socket path to receive control commands from the client.
[0080] The instruction protocol definition and parsing module is used to enable the client to send control requests to the local Unix domain socket path in a structured format, and the server to receive and parse the control requests;
[0081] The system executes instructions and simulates an input module, enabling the server to map parsed control requests to corresponding kernel input event sequences based on the operation type. Simultaneously, it injects input events into the system by writing event data to the virtual input device to simulate control operations on the graphical desktop.
[0082] To better understand the technical solution of the present invention, the present invention will be further described below with reference to specific embodiments:
[0083] Client call example
[0084] Example 1: High-precision mouse control based on resolution adaptation
[0085] Application scenario: On Ubuntu 22.04 (which uses the GNOME desktop environment by default and the Wayland protocol), the RPA robot needs to automatically click a "Confirm" button in a specific area of the screen (e.g., coordinates x:1920, y:500).
[0086] Implementation steps:
[0087] Service Startup and Resolution Acquisition: The server program starts as a background process. First, the screen resolution acquisition module detects the current active display output by calling the libdrm library. Assume the detected current screen resolution is 1920x1080.
[0088] Virtual Device Registration: The server registers a virtual input device in user space via the / dev / uinput interface. To support absolute coordinate movement (which is crucial for precise RPA control), the virtual device is explicitly configured as a touchpad and declares support for kernel event bits such as EV_ABS (absolute coordinate event), ABS_X, and ABS_Y.
[0089] Receiving instructions: The RPA client program (Python script) sends instructions to the local Unix domain socket in JSON format: {"action": "mouseMove", "x": 1920, "y": 500}.
[0090] Coordinate transformation and event injection: After receiving the instruction, the server maps the target coordinates (1920, 500) to kernel-recognizable absolute position event values (ABS_X, ABS_Y) based on the resolution (1920x1080) obtained in step 1.
[0091] Technical advantages: Traditional analog mice typically send relative displacements (dx, dy), which can easily lead to accumulated errors. This technology sends absolute coordinates via a virtual touchpad, ensuring precise one-step jump to the target location regardless of the mouse's current position, eliminating the need for repeated calibrations.
[0092] Click execution: The client then sends {"action": "mouseClick", "button": "left"}, which the server converts into a BTN_LEFT kernel event sequence of press and release to complete the click.
[0093] The benefits of this embodiment:
[0094] Breaking Wayland's limitations: It works directly at the kernel level without relying on GNOME-specific plugins.
[0095] High precision: Combining automatic resolution detection and absolute coordinate mapping, it solves the problem that traditional tools cannot obtain screen coordinates or have coordinate drift in Wayland.
[0096] Example 2: Simulation of Complex Input Using Combination Keys with Timing Control
[0097] Application scenario: Automated scripts require opening a terminal (shortcut key Ctrl+Alt+T), waiting for the terminal to appear, entering a command and pressing Enter.
[0098] Implementation steps:
[0099] Establishing a connection: The client connects to the local Unix domain socket / tmp / rpa_input.sock that the server is listening on.
[0100] Sending key combination commands: The client sends a JSON command: {"action": "keyCombo", "keys": ["KEY_LEFTCTRL", "KEY_LEFTALT", "KEY_T"]}.
[0101] Parsing and Delayed Processing: The server parses the instruction. During execution, the program does not write all events at once, but instead uses the following logic:
[0102] Write KEY_LEFTCTRL (press) -> insert 50ms delay;
[0103] Write KEY_LEFTALT (press) -> Insert 50ms delay;
[0104] Write KEY_T (press) -> Insert 50ms delay;
[0105] Release the buttons in sequence.
[0106] Technical advantages: The manual mentions that kernel call speed is faster than interface rendering speed. If all events are injected instantly, the Wayland compositor may not be able to process them in time, causing hotkeys to become ineffective. This technology's built-in 50ms delay mechanism perfectly solves this "key swallowing" problem.
[0107] Text input and Enter key: After the terminal is opened, the client sends a series of keystroke commands to input characters, and finally sends {"action": "keyPress", "key": "ENTER"}.
[0108] The benefits of this embodiment:
[0109] High stability: Through microsecond-level timing control, the success rate of complex key combination operations on the graphical interface is guaranteed, and the automated process is prevented from being interrupted due to system response delay.
[0110] Good versatility: This key combination logic is applicable to all desktops that follow the Linux input subsystem, and is not limited to specific window manager shortcut definition methods.
[0111] Example 3: Multi-client permission management in a secure isolation environment
[0112] Application scenario: Running automated operation and maintenance tasks on an enterprise-level server. This server is logged in by multiple users at the same time, and the security requirements are extremely high, prohibiting any network ports from being open.
[0113] Implementation steps:
[0114] Sandboxed service deployment: The system administrator configures the server program as a user service unit through systemd and sets udev rules: KERNEL=="uinput", GROUP="rpa-users", MODE="0660". This ensures that only processes belonging to the rpa-users group can access the virtual input device.
[0115] Local communication interface setup: The server starts and listens on the Unix domain socket. The program automatically sets the permissions of the socket file to allow only specific user groups to read and write (e.g., chmod 660).
[0116] Technical advantages: Compared to tools like ydotool that may require sudo privileges or open network ports, this technical solution is entirely based on a local file system permission model. External attackers cannot remotely manipulate the mouse and keyboard over the network.
[0117] Concurrent access from multiple clients: Two different automation scripts (client A and client B) run simultaneously. The server leverages the features of Unix sockets to manage multiple connections concurrently via select or epoll mechanisms.
[0118] Client A sends mouse movement commands to perform UI testing.
[0119] Client B sends keyboard commands to input data in the background. The server processes these commands in the order they are received, without any conflicts.
[0120] The benefits of this embodiment:
[0121] Extremely high security: It conforms to Wayland's "security isolation" principle, does not compromise the overall security model of the system, and does not require root privileges to run client scripts (only in a specific group).
[0122] Easy to integrate: This architecture is well-suited for integration into CI / CD pipelines, Docker containers, or Flatpak sandbox applications, addressing the pain point of existing automation tools being difficult to deploy in containers.
[0123] This invention proposes an RPA mouse and keyboard control method suitable for the Wayland desktop environment, which solves the technical problem that traditional automation tools cannot inject input events under the Wayland protocol, and realizes high-precision, highly versatile, safe and controllable desktop input simulation.
[0124] In terms of system architecture, a layered architecture of "user space virtual device + local socket communication" is adopted. Virtual input devices are created through uinput and remote commands are received in conjunction with Unix Socket, which decouples the control logic from input injection and improves the maintainability and scalability of the system.
[0125] In terms of protocol design, a JSON-based structured control instruction set is defined, which supports common operations such as mouse movement, clicking, keyboard input, and key combinations. The instruction format is unified and easy to parse, making it easy to integrate with various RPA engines and scripting languages.
[0126] In terms of compatibility, the virtual device is registered as a touchpad type and the screen resolution is dynamically obtained. It adapts to Wayland's mechanism for handling absolute coordinates, effectively bypassing its security isolation restrictions. It can run stably in mainstream Wayland desktop environments such as GNOME, KDE, and Sway without relying on specific window managers or compositor extensions.
[0127] In terms of control precision, it supports absolute coordinate positioning and key combination timing control. By inserting a reasonable delay (such as 50ms) between key events, it ensures that complex operations (such as Ctrl+Alt+T) can be correctly recognized by the target application, meeting the requirements of RPA scenarios for operation realism and reliability.
[0128] In terms of security and deployment, it adopts local Unix domain socket communication, does not expose network ports, and implements access control by combining file permissions and user group mechanisms, balancing security and flexibility. It also supports deployment in containerized environments such as Docker and Flatpak, adapting to modern software distribution models.
[0129] In summary, this invention is a Wayland desktop input control solution with a reasonable architecture, strong compatibility, precise control, and high security and reliability, providing underlying support capabilities for applications such as RPA and automated testing.
[0130] The above description is merely a detailed explanation of preferred embodiments and principles of the present invention. For those skilled in the art, there may be changes in specific implementation methods based on the ideas provided by the present invention, and these changes should also be considered within the scope of protection of the present invention.
Claims
1. An RPA mouse and keyboard control method for Wayland desktop, characterized in that, Includes the following steps; S1, Initialize the virtual input device: In user space, a virtual input device that supports mouse and keyboard functions is created and registered by accessing the virtual input device interface provided by the operating system kernel; The virtual input device is configured as a touchpad type and declares support for absolute coordinate events to report standard input events based on absolute coordinates to the Wayland desktop system through the kernel event system, thereby enabling absolute coordinate positioning of the mouse pointer while Wayland restricts external programs from obtaining the global coordinates of the mouse. S2, Get screen resolution: Detect the current active display output and obtain screen resolution information; S3, build the local communication interface: The server program listens on a preset local Unix domain socket path to receive control commands from clients; the local Unix domain socket file is access controlled based on local file system permissions and does not require open network ports. S4 defines and parses the instruction protocol: The client sends a control request in a structured format to the local Unix domain socket path, and the server receives and parses the control request. S5, execute the instruction and simulate input: The server maps the parsed control requests to the corresponding kernel input event sequence according to the operation type; at the same time, it injects the input events into the system by writing event data to the virtual input device to simulate control operations on the graphical desktop.
2. The RPA mouse and keyboard control method for Wayland desktop according to claim 1, characterized in that, In step S2, the methods for obtaining screen resolution information include: The current framebuffer information can be obtained by parsing the sysfs file system, calling the udev library interface, accessing the logind D-Bus interface, or calling the libdrm / gbm library.
3. The RPA mouse and keyboard control method for Wayland desktop according to claim 2, characterized in that, In step S3, the access permissions of the local Unix domain socket file are set to allow only specific users or user groups to read and write.
4. The RPA mouse and keyboard control method for Wayland desktop according to claim 3, characterized in that, In step S4, the control request includes at least mouse movement, mouse click, key input, and combination key operation instructions.
5. The RPA mouse and keyboard control method for Wayland desktop according to claim 4, characterized in that, In step S4, the structured format is JSON format; the mouse movement command includes absolute coordinate parameters, and the server converts the absolute coordinates into the corresponding ABS_X absolute position event value and ABS_Y absolute position event value according to the obtained screen resolution.
6. The RPA mouse and keyboard control method for Wayland desktop according to claim 5, characterized in that, In step S5, during the simulated control operation of the graphical desktop, for combination key operations, a preset delay is inserted between the execution of each key event; the preset delay is 50 milliseconds.
7. The RPA mouse and keyboard control method for Wayland desktop according to claim 6, characterized in that, It also includes the access control process, as detailed below: You can grant access permissions for virtual input devices to specific user groups by configuring udev rules, or you can use the system daemon systemd's user service unit to perform sandboxed permission control for server programs.
8. The RPA mouse and keyboard control method for Wayland desktop according to claim 1, characterized in that, The server program runs in a background persistent mode and supports multi-client access management as well as an extended command interface based on JSON-RPC style.
9. An RPA mouse and keyboard control system for Wayland desktop, used to implement the RPA mouse and keyboard control method for Wayland desktop as described in any one of claims 1-8, characterized in that, The RPA mouse and keyboard control system for Wayland desktops includes: The virtual input device initialization module is used to create and register a virtual input device that supports mouse and keyboard functions in user space by accessing the virtual input device interface provided by the operating system kernel; the virtual input device is configured as a touchpad type and reports standard input events to the desktop system through the kernel event system; The screen resolution acquisition module is used to detect the currently active display output and acquire screen resolution information; The local communication interface building module is used to enable the server program to listen on a preset local Unix domain socket path to receive control commands from the client. The instruction protocol definition and parsing module is used to enable the client to send control requests to the local Unix domain socket path in a structured format, and the server to receive and parse the control requests; The system executes instructions and simulates an input module, enabling the server to map parsed control requests to corresponding kernel input event sequences based on the operation type. Simultaneously, it injects input events into the system by writing event data to the virtual input device to simulate control operations on the graphical desktop.