A GPU memory virtualization method and apparatus based on deep parsing of the ioctl protocol

By intercepting the ioctl protocol in user space, parsing and auditing GPU memory requests, constructing a shadow object tree and implementing view virtualization, the security and stability issues of memory isolation in multi-tenant environments are solved, achieving efficient GPU resource sharing and seamless adaptation to deep learning frameworks.

CN122309140APending Publication Date: 2026-06-30XI AN JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XI AN JIAOTONG UNIV
Filing Date
2026-03-23
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing GPU virtualization technologies struggle to simultaneously achieve the security, stability, metering accuracy, and application compatibility of memory isolation in multi-tenant containerized environments, leading to resource over-selling, system instability, or the inability of deep learning frameworks to function properly.

Method used

By intercepting ioctl system call requests in user space, parsing memory-related instructions, constructing a shadow object tree for intelligent auditing, and determining quotas during the physical access control phase, view virtualization is implemented to achieve precise isolation and sharing.

Benefits of technology

It achieves kernel-level control and user-space deployment flexibility, solves the problem of resource overselling, ensures system stability and the normal operation of deep learning frameworks, and reduces system maintenance costs and time latency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309140A_ABST
    Figure CN122309140A_ABST
Patent Text Reader

Abstract

This invention discloses a GPU memory virtualization method and apparatus based on deep parsing of the ioctl protocol, belonging to the field of electronic information technology. The method includes: intercepting ioctl system call requests initiated by the target container to obtain ioctl system call requests for GPU memory management; parsing its parameter blocks and extracting memory-related instructions to obtain memory-related instructions for subsequent auditing; constructing a shadow object tree and intelligently auditing memory allocation and mapping operations to obtain a precise memory usage ledger; determining whether the quota has been exceeded and executing interception during the physical admission control phase to obtain a control result on whether instruction pass-through is allowed; implementing view virtualization and returning virtualized memory information to the application in the target container to achieve transparent application adaptation, thereby completing precise virtualization isolation and multi-tenant secure sharing of GPU memory. This invention combines kernel-level control strength with user-space deployment flexibility.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of electronic information technology, and specifically relates to a GPU memory virtualization method and device based on deep parsing of the ioctl protocol. Background Technology

[0002] With the rapid development of artificial intelligence technology, the demand for massively parallel computing power in deep learning training and inference tasks is growing exponentially. Kubernetes-based cloud-native architecture has become the standard model for deploying and managing AI workloads, its core being the improvement of infrastructure elasticity and efficiency through resource pooling. GPUs, as key computing power carriers, directly impact hardware utilization and overall cluster throughput through their pooling and sharing capabilities. However, high-end GPUs are expensive, and individual tasks (especially inference or small-to-medium-scale training tasks) often cannot fully utilize the computing power and memory resources of the entire GPU card. Therefore, enabling the sharing of a single physical GPU in multi-tenant or containerized environments has become a crucial industry requirement.

[0003] The core challenge of GPU resource sharing lies in memory isolation. Current industry research primarily explores three technical approaches: software solutions at the user level that hijack or redirect CUDA library function calls; system-level solutions that delve into the operating system kernel to intercept communication between GPU drivers and hardware; and multi-instance partitioning technologies that rely on native GPU hardware support. These solutions have already been applied in proof-of-concept studies and in certain scenarios.

[0004] However, existing GPU virtualization technologies still have significant shortcomings in terms of security, stability, measurement accuracy, and application compatibility in achieving memory isolation, which has become a major technical problem restricting large-scale production deployment.

[0005] Deficiencies and shortcomings of existing technologies Current GPU memory virtualization technologies primarily achieve resource sharing through user-space API hijacking, kernel module interception, and hardware multi-instance partitioning. However, in multi-tenant containerized environments, these solutions struggle to simultaneously achieve the security, stability, metering accuracy, and application compatibility of memory isolation, leading to resource over-selling, system instability, or the inability of existing deep learning frameworks to function properly. This core deficiency has become a major technical problem restricting the efficient pooling and large-scale production deployment of GPUs. Summary of the Invention

[0006] This invention provides a GPU memory virtualization method and apparatus based on deep parsing of the ioctl protocol. The purpose is to solve the problem that existing technologies in multi-tenant containerized environments are unable to simultaneously achieve the security, stability, metering accuracy and application compatibility of memory isolation, which leads to resource over-selling, system instability or the inability of existing deep learning frameworks to operate normally.

[0007] To achieve the above objectives, the present invention adopts the following technical solution: A GPU memory virtualization method based on deep parsing of the ioctl protocol includes: Intercept the ioctl system call requests initiated by the target container to obtain the ioctl system call requests used for GPU memory management; Based on the intercepted ioctl system call request, its parameter block is parsed and memory-related instructions are extracted to obtain memory-related instructions for subsequent auditing. Based on the extracted memory-related instructions, a shadow object tree is constructed and intelligent auditing of memory allocation and mapping operations is performed to obtain an accurate memory usage ledger. Based on the accurate memory usage ledger obtained from the audit, the physical access control phase determines whether the quota has been exceeded and executes interception to obtain the control result of whether instruction pass-through is allowed. Based on the control results, view virtualization is implemented and virtualized memory information is returned to the application in the target container to achieve transparent application adaptation, thereby completing precise virtualization isolation and multi-tenant secure sharing of GPU memory.

[0008] A further improvement of this invention is that the interception of the ioctl system call request initiated by the target container specifically includes: By leveraging the Linux dynamic linking preloading mechanism, the ioctl function of the C standard library can be taken over in user space; Based on the device magic number and command code from the NVIDIA driver, the memory allocation instructions, address mapping instructions, heap management instructions, and deallocation instructions are filtered out from all ioctl requests.

[0009] A further improvement of this invention is that the step of parsing its parameter block and extracting video memory-related instructions specifically includes: Iterate through the memory regions pointed to by the ioctl parameter pointers and identify fields that conform to the page alignment characteristics of 4KB, 64KB, or 2MB and have a reasonable numerical range. A greedy maximum value matching strategy is used to select the maximum reasonable value as the physical memory allocation size for this application. Simultaneously, the resource type identifier is parsed, and subsequent isolation logic is executed for video memory resources.

[0010] A further improvement of the present invention is that the construction of the shadow object tree and the intelligent auditing of video memory allocation and mapping operations specifically include: maintaining a global shadow object tree in user mode, recording handles, video memory size, resource type and ownership relationship; Record the initial billing for the handle that is successfully returned by the physical allocation instruction; For subsequent mapping instructions, first query the shadow tree: if the handle already exists, mark it as deduplicated; if it is an unknown handle, perform supplementary billing. This enables strict synchronization between the ledger and physical consumption in scenarios where allocation and mapping are decoupled.

[0011] A further improvement of this invention is that the step of determining whether the quota has been exceeded and executing the interception during the physical access control phase specifically includes: Before each instruction that may cause memory usage is sent to the actual driver, calculate the sum of the current cumulative usage and the current request amount; If the video memory quota set for the container is exceeded, the ioctl request is intercepted and the application is returned with either CUDA_ERROR_OUT_OF_MEMORY or ENOMM error code.

[0012] A further improvement of the present invention is that the step of implementing view virtualization based on the control result and returning the virtualized video memory information to the application specifically includes: Hook the cuDeviceTotalMem, cuMemGetInfo, or cudaGetDeviceProperties video memory query interface; Returns a false total video memory capacity to the application, which is the user-defined quota value and the remaining video memory capacity, which is the quota minus the current shadow tree usage. This forces the cache allocators of the PyTorch and TensorFlow frameworks to automatically converge to the quota range, achieving transparent adaptation without code modification.

[0013] A GPU memory virtualization device based on deep parsing of the ioctl protocol, comprising: The ioctl system call request acquisition unit intercepts the ioctl system call requests initiated by the target container and obtains the ioctl system call requests used for GPU memory management. The video memory related instruction parsing unit, based on the intercepted ioctl system call request, parses its parameter block and extracts video memory related instructions to obtain video memory related instructions for subsequent auditing; The accurate video memory usage ledger acquisition unit constructs a shadow object tree and performs intelligent auditing of video memory allocation and mapping operations based on the extracted video memory-related instructions to obtain an accurate video memory usage ledger. The control result acquisition unit, based on the accurate video memory usage ledger obtained from the audit, determines whether the quota has been exceeded and performs interception during the physical access control stage, and obtains the control result of whether instruction pass-through is allowed. The implementation unit, based on the control results, performs view virtualization and returns the virtualized video memory information to the application in the target container to achieve transparent application adaptation, thereby completing precise virtualization isolation and multi-tenant secure sharing of GPU video memory.

[0014] A further improvement of this invention is that, in the ioctl system call request acquisition unit, the interception of the ioctl system call request initiated by the target container specifically includes: By leveraging the Linux dynamic linking preloading mechanism, the ioctl function of the C standard library can be taken over in user space; Based on the device magic number and command code from the NVIDIA driver, the memory allocation instructions, address mapping instructions, heap management instructions, and deallocation instructions are filtered out from all ioctl requests.

[0015] An electronic device includes a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to implement the GPU memory virtualization method based on deep parsing of the ioctl protocol.

[0016] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the GPU memory virtualization method based on deep parsing of the ioctl protocol.

[0017] Compared with the prior art, the present invention has at least the following beneficial technical effects: I. Combining Kernel-Level Control and User-Space Deployment Flexibility. This invention creatively resolves the long-standing contradiction between "strong isolation security" and "high system availability" in the GPU virtualization field at the architectural level, achieving a new isolation paradigm that combines kernel-level control with user-space deployment flexibility. In existing technologies, user-space solutions based on CUDARuntime API hijacking, while lightweight in deployment, suffer from overly loose defense boundaries. Malicious users can easily bypass the interception layer by statically compiling applications, directly calling the underlying Driver API, or dynamically loading driver symbols, rendering resource quota restrictions ineffective. While kernel module-based interception solutions establish an absolute defense at Ring 0, they introduce complex business logic into the kernel space. Any minor code flaw (such as null pointer references or deadlocks) can directly trigger a kernel panic in the host operating system, causing the entire machine to crash. Its enormous "explosion radius" makes large-scale deployment in production environments extremely difficult. This invention takes a different approach, establishing a defense at the System Call Interface (SCI) between the user-space driver library (libcuda.so) and the kernel. Since ioctl is the essential gateway from user space to the kernel driver, no matter how upper-layer applications obfuscate the call path, they ultimately cannot escape the protocol-level monitoring of this invention. This achieves a strong isolation effect comparable to kernel modules without modifying the kernel code. Furthermore, because the interception code runs entirely within the target process's user space, even in the event of an abnormal crash, its impact is strictly limited to a single container, ensuring it will not affect the host machine or other tenant tasks. This perfectly meets the stringent requirements of cloud-native environments for high availability and fault isolation.

[0018] II. A Heuristic Protocol Parsing Mechanism Independent of Specific Driver Data Structures. This invention constructs a heuristic protocol parsing mechanism independent of specific driver data structures, overcoming the challenges of high maintenance costs and poor compatibility caused by fragmentation of closed-source driver versions. NVIDIA GPU drivers are maintained in a closed-source state, and the parameter structure definitions of their underlying communication protocols (especially core memory allocation commands such as 0x2b) are not publicly disclosed. Furthermore, with driver version iterations (such as upgrades from the Ampere architecture to the Ada Lovelace architecture), the offsets and memory layouts of their internal fields frequently change. Traditional techniques often rely on hard-coded offsets for parsing, which causes isolation tools to immediately fail once the driver is upgraded, requiring significant manpower for reverse engineering and adaptation. This invention abandons the dependence on static structure definitions and instead utilizes the inherent characteristics of physical memory allocation—namely, page alignment and the reasonableness of numerical ranges—to design a robust memory feature scanning algorithm. The system can automatically "sniff out" fields with reasonable values ​​that conform to the alignment characteristics of 4KB, 64KB, or 2MB from chaotic protocol parameter blocks, and accurately lock the actual physical request size through a "greedy maximum value strategy." This content-based generalized parsing capability enables the present invention to adaptively support a wide range of driver versions and hardware models with a single codebase, significantly reducing the marginal maintenance cost of the system.

[0019] III. Physical Precision Ledger Synchronization. This invention completely eradicates the common problems of "double billing" false positives and "implicit allocation" false negatives in video memory virtualization by constructing a full lifecycle shadow resource auditing system, achieving physical precision ledger synchronization. In modern GPU driver architectures, physical video memory allocation and virtual address mapping are two decoupled independent operations with complex reference relationships. Existing technologies often simply and crudely intercept all related instructions and accumulate them, resulting in the same video memory being billed multiple times (double billing), causing users to encounter false OOM errors when the video memory is not actually exhausted; or failing to identify implicit mapping channels such as DMA, leading to resource over-selling. This invention reconstructs a "shadow object tree" synchronized with the kernel state in user space and implements intelligent deduplication and omission correction logic: when intercepting mapping instructions, the system automatically backtracks the registry state. If it finds that the resource handle has already been billed during the allocation stage, it automatically exempts the counting of this operation; if an unknown handle is found, it is determined to be an escaped resource and is forcibly added to the registry. This three-dimensional auditing mechanism of "Alloc master control and Map leak filling" ensures that the quota ledger at the software level and the physical consumption at the hardware level are synchronized with each other with millimeter accuracy, maximizing the effective utilization of expensive GPU resources.

[0020] IV. Addressing the Pain Point of Deep Learning Frameworks Crashing During Startup in Memory-Constrained Environments. This invention utilizes view virtualization technology to construct an application-transparent adaptation layer, resolving the industry-wide pain point of "greedy" deep learning frameworks crashing during startup in memory-constrained environments. Mainstream AI frameworks such as PyTorch and TensorFlow employ extremely aggressive memory management strategies to optimize performance, namely, reading the total physical memory of the GPU during startup initialization and attempting to pre-allocate it proportionally (e.g., 90%). In multi-tenant memory allocation scenarios (e.g., limiting containers to 4GB on a 24GB graphics card), this greedy behavior immediately triggers the application's access control, causing the application to crash and forcing users to modify code or cumbersome configuration parameters. This invention implements a sophisticated data manipulation strategy by hooking into the upper-layer memory query interface, returning false hardware specification data (i.e., the user-defined quota value) to the application. This mechanism successfully deceives the upper-layer framework's memory management module, forcing it to automatically converge its internal cache pool limit to the range specified in this invention and dynamically adjust the garbage collection threshold. This allows a massive number of existing AI applications to be smoothly migrated to the restricted containers managed by this system and run stably without modifying a single line of code, achieving true "seamless virtualization".

[0021] V. In-process dynamic injection technology offers extremely low latency and system overhead. This invention employs in-process dynamic injection technology, which, compared to remote forwarding or proxy modes, offers extremely low latency and system overhead, making it suitable for high-performance computing scenarios. Unlike earlier API forwarding schemes (such as rCUDA) that require transmitting instructions between the front-end and back-end via network protocol stacks or inter-process communication (IPC), resulting in significant performance losses, this invention uses the Linux dynamic linking preloading (LD_PRELOAD) mechanism to directly load the interception logic into the target process's address space. All protocol parsing, ledger verification, and policy execution are completed rapidly within the process via function calls, without involving additional data copying or context switching. Real-world testing shows that the additional latency introduced by this invention to GPU computing tasks is negligible (typically less than 1%), perfectly supporting throughput- and latency-sensitive model training and inference tasks, ensuring resource isolation without sacrificing the high-performance characteristics of heterogeneous accelerators. Attached Figure Description

[0022] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0023] Figure 1This is a diagram showing the overall system architecture based on ioctl driver protocol interception. Figure 2 Flowchart for processing video memory allocation instructions; Figure 3 Flowchart for command code recognition; Figure 4 Sequence diagram for the interaction between view virtualization and application startup; Figure 5 This is a structural block diagram of the device of the present invention. Detailed Implementation

[0024] In the following description, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments can be modified in various ways without departing from the spirit or scope of the invention. Therefore, the drawings and description are considered to be exemplary in nature and not restrictive.

[0025] In the description of this invention, it should be understood that, when used in this specification and the appended claims, the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0026] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0027] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0028] The accompanying drawings illustrate various structural schematic diagrams according to embodiments disclosed in this invention. These drawings are not to scale, and some details have been enlarged for clarity, and some details may have been omitted. The shapes of the various regions and layers shown in the drawings, as well as their relative sizes and positional relationships, are merely exemplary and may deviate from reality due to manufacturing tolerances or technical limitations. Furthermore, those skilled in the art can design regions / layers with different shapes, sizes, and relative positions as needed.

[0029] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0030] In the description of this invention, unless otherwise explicitly defined, terms such as "set up," "install," and "connect" should be interpreted broadly, and those skilled in the art can reasonably determine the specific meaning of the above terms in this invention in conjunction with the specific content of the technical solution.

[0031] Terminology Explanation I. Infrastructure and Interception Technology Terminology 1. User-space Driver Protocol Interception The core technical approach of this invention involves utilizing the operating system's dynamic linking mechanism (such as Linux's LD_PRELOAD) to intercept and proxy the C standard library's ioctl system calls between the user-space driver library (such as libcuda.so) and the operating system kernel. This technology audits GPU instructions at the binary communication protocol level, and unlike high-level API hijacking, it possesses strong isolation characteristics that cannot be bypassed by static compilation.

[0032] 2. Input / Output Control (ioctl) This is a standard system call interface used by user-mode programs to communicate with device drivers. In the NVIDIA GPU architecture, all upper-layer memory allocation, mapping, and deallocation operations are ultimately translated into specific ioctl command codes (such as 0x54, 0x2b, 0x4e) and sent to the kernel. This invention achieves control over hardware resources by intercepting this interface.

[0033] 3. Dynamic Link Library Interception (LD_PRELOAD Interception) This invention employs a technique within a Linux environment. By preloading a custom shared library, it ensures that applications, before calling system functions (such as ioctl), preferentially enter the hook function defined in this invention. This mechanism enables "non-intrusive" implantation into the target process.

[0034] II. Memory Audit and Terminology 4. Shadow Object Tree / Registry This refers to a global data structure (registry) maintained in user space by this invention. Since the driver protocol transmits opaque integer handles, the system needs to reconstruct the lifecycle view of the video memory object in user space. It records the handle, video memory size, resource type (Class ID), and ownership relationship parsed from the protocol parameters, serving as the data foundation for accurate auditing.

[0035] 5. Heuristic Parameter Scanning This invention refers to a parsing method for closed-source driver protocols. When the exact structure definition cannot be obtained or the driver version changes, this invention traverses the memory blocks of ioctl parameters, identifies fields that meet specific physical characteristics (such as memory page alignment and reasonable numerical range), and uses a "greedy strategy" (taking the maximum reasonable value) to reverse deduce the memory allocation size, thereby achieving adaptive compatibility with multiple driver versions.

[0036] 6. Decoupling of video memory allocation and mapping This refers to a mode of GPU driver resource management. The allocation of physical video memory (obtaining a handle) and the creation of virtual addresses (mapping page tables) are two independent instruction operations. This invention addresses this characteristic by designing specific auditing logic to prevent billing discrepancies.

[0037] 7. Intelligent Deduplication & Catch-up This invention refers to a bidirectional verification logic used in the video memory auditing process. On the one hand, by querying the shadow object tree, duplicate mapping operations of already billed handles are ignored (deduplication), preventing premature OOM caused by double billing; on the other hand, handles that are not recorded in the shadow tree but have undergone mapping operations are forcibly billed (supplementation), to cover implicit allocation paths such as DMA, ensuring that the ledger is consistent with the physical state.

[0038] III. Control Strategies and Compatibility Terminology 8. Physical Admission Control This refers to a mandatory blocking mechanism implemented at the protocol layer. Before each memory allocation instruction is issued to the actual driver, the system calculates the sum of the "current cumulative usage" and the "current request amount". If the set quota is exceeded, the ioctl request is directly intercepted and an "Out of Memory (OOM)" error code is returned, thereby ensuring the mandatory isolation of resources at the physical level.

[0039] 9. View Virtualization (Spoofing) This refers to intercepting memory query interfaces (such as cuDeviceTotalMem and cudaGetDeviceProperties) and returning false hardware specification data to the application (modifying the total physical video memory to the user-defined quota value). The purpose is to deceive upper-layer applications into believing that they are running on a smaller-spec GPU.

[0040] 10. Greedy Pre-allocation This refers to a memory management behavior of deep learning frameworks (such as PyTorch's Caching Allocator). They tend to request a large amount of GPU memory (e.g., 8GB at a time) from the driver at startup or during runtime to establish a cache pool. This invention constrains this behavior through view virtualization and environment variable configuration to prevent it from exhausting physical GPU memory.

[0041] 11. Big Page Alignment This refers to the behavior of GPU drivers rounding up the allocated physical video memory size at a specific granularity (such as 2MB or larger) to improve performance. This invention takes this feature into account during auditing to ensure that the software ledger matches the actual hardware consumption.

[0042] Example 1 I. Overall Technical Workflow This invention proposes a virtualization shim architecture located between the user-space driver and the operating system kernel. Its core is to achieve precise control and isolation of GPU memory resources by deeply analyzing and intercepting the low-level binary communication protocol of the GPU driver. This solution aims to systematically solve key problems such as overselling of GPU memory, billing errors, and application compatibility in multi-tenant shared GPU environments, and it is implemented entirely in user space without modifying the operating system kernel or GPU physical driver. Figure 1 As shown, the system constructs a transparent interception layer within the software stack, forming a complete closed loop from instruction capture and resource auditing to policy execution. Its specific implementation process comprises five closely interconnected logical stages, such as... Figure 2 As shown, the overall data processing sequence follows the sequence of "protocol interception → feature parsing → ledger verification → quota determination → view feedback".

[0043] Phase S1: Driver Protocol Interception and Command Filtering This stage is the entry point for the entire system's perception. The system utilizes the Linux dynamic linking preloading (LD_PRELOAD) mechanism to inject a custom interception library into the target application's process space. This library uses symbol overriding technology to take over the C standard library's ioctl system calls, thereby inserting a hook between the user-space driver library (libcuda.so) and the kernel. All control command flows from applications to the GPU hardware via the CUDA driver first pass through this hook. Based on the NVIDIA driver-specific device magic number and command code, the interception layer precisely filters out key instructions directly related to video memory management from massive ioctl requests, such as... Figure 3 As shown, this mainly includes: physical memory allocation (e.g., 0x2b, 0x54), address mapping (e.g., 0x4e, 0x57), heap memory management (0x50), and resource release (0x21). This step shields irrelevant calculation or rendering instructions, providing a clean and crucial protocol data source for subsequent processing, fundamentally controlling the "choke point" of driver communication.

[0044] Phase S2: Heuristic Parameter Analysis and Feature Extraction Addressing the core challenge of NVIDIA drivers being closed-source and their internal data structures frequently changing during version iterations, this phase abandons the fragile parsing method that relies on hard-coded structure offsets. When intercepting complex memory allocation instructions (such as 0x2b), the system faces parameter blocks with unknown memory layouts. To address this, this invention designs a robust heuristic memory scanning algorithm: the system traverses the memory regions pointed to by the parameter pointers, identifying potential "allocation size" fields based on the physical characteristics of memory allocation—that is, the requested size is usually an integer multiple of a specific page size (such as 4KB, 64KB, or 2MB) and the value is within a reasonable range. To prevent misjudging fields such as "alignment granularity" as allocation sizes, the algorithm employs a "greedy maximum value matching strategy," selecting the maximum value from all candidate values ​​that meet the characteristics as the physical memory size for this allocation. Simultaneously, the system parses the resource type identifier (Class ID) to ensure that isolation logic is only executed for video memory (such as NV01_MEMORY_LOCAL_USER). This method achieves a high degree of adaptability to driver versions and hardware models.

[0045] Phase S3: Shadow Object Tree Construction and Intelligent Auditing To address the billing challenges arising from opaque driver handles and the separation of "physical allocation" and "address mapping" operations, this stage dynamically maintains a global "Shadow Object Tree" in user space as a precise resource registry. The system performs crucial "deduplication and omission correction" intelligent auditing logic here: When a physical memory allocation (Alloc) instruction is successfully executed, the system records the new handle returned by the driver and its approved size in the Shadow Tree. When a subsequent mapping (Map) instruction arrives, the system first checks if the target handle already exists in the tree. If it does, it determines that the memory has already been billed during the allocation phase, and this mapping operation will be marked as "deduplicated," preventing repeated usage and eliminating the "double billing" problem. Conversely, if the mapping instruction involves an unknown handle, it determines that the memory may have been allocated via implicit paths such as DMA or external imports, and the system will immediately perform "omission correction" billing to prevent resource escape. This mechanism ensures that the user-space memory ledger remains strictly synchronized with the actual physical state of the GPU hardware.

[0046] Phase S4: Physical Access Control and Quota Enforcement This phase is the execution point for the mandatory isolation policy. Based on the accurate ledger maintained in phase S3, before each instruction (allocation or fill-in mapping) that may cause memory usage is sent to the real driver, the system executes a pre-judgment logic: calculating the sum of the current cumulative usage and the current request, and checking whether it exceeds the quota threshold set for this tenant. If the judgment passes, the instruction is passed through normally; if the limit is exceeded, the interception layer will directly "circuit down" the request, cancel the call to the real driver, and return a standard out-of-memory (ENOMEM / CUDA_ERROR_OUT_OF_MEMORY) error to the application. Since ioctl is the only channel for allocating physical memory, the blocking here is equivalent to building a firewall at the hardware level, ensuring the insurmountability of quota limits.

[0047] Phase S5: View Virtualization and Application Adaptation To ensure compatibility with existing applications, especially deep learning frameworks with "greedy pre-allocation" characteristics (such as PyTorch), upward view virtualization was implemented in this phase. The system dynamically modifies the memory capacity data returned to the application layer by hooking key memory query interfaces (such as cuDeviceTotalMem, cuMemGetInfo, cudaGetDeviceProperties). Specifically, the system virtualizes the reported "total device memory" into a user quota value and calculates "remaining memory" as "quota minus currently used ledger data." Figure 4As shown, this deception mechanism makes upper-layer applications mistakenly believe they are running on a GPU with a small amount of video memory, thereby driving their internal memory allocator (such as PyTorchCaching Allocator) to automatically adjust the memory pool size and allocation strategy, enabling normal initialization and operation within the limited resource space. This forms a closed-loop logic for resource management, achieving transparent and seamless support for massive amounts of existing business code.

[0048] Example 2 like Figure 5 As shown, the present invention provides a GPU memory virtualization device based on deep parsing of the ioctl protocol, comprising: The ioctl system call request acquisition unit intercepts the ioctl system call requests initiated by the target container and obtains the ioctl system call requests used for GPU memory management. The video memory related instruction parsing unit, based on the intercepted ioctl system call request, parses its parameter block and extracts video memory related instructions to obtain video memory related instructions for subsequent auditing; The accurate video memory usage ledger acquisition unit constructs a shadow object tree and performs intelligent auditing of video memory allocation and mapping operations based on the extracted video memory-related instructions to obtain an accurate video memory usage ledger. The control result acquisition unit, based on the accurate video memory usage ledger obtained from the audit, determines whether the quota has been exceeded and performs interception during the physical access control stage, and obtains the control result of whether instruction pass-through is allowed. The implementation unit, based on the control results, performs view virtualization and returns the virtualized video memory information to the application in the target container to achieve transparent application adaptation, thereby completing precise virtualization isolation and multi-tenant secure sharing of GPU video memory.

[0049] In the ioctl system call request acquisition unit of this embodiment, the interception of the ioctl system call request initiated by the target container specifically includes: By leveraging the Linux dynamic linking preloading mechanism, the ioctl function of the C standard library can be taken over in user space; Based on the device magic number and command code from the NVIDIA driver, the memory allocation instructions, address mapping instructions, heap management instructions, and deallocation instructions are filtered out from all ioctl requests.

[0050] Example 3 The present invention provides an electronic device, including a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to implement a GPU memory virtualization method based on deep parsing of the ioctl protocol.

[0051] The electronic device may also include one or more of the following: multimedia components, input / output (I / O) interfaces, and communication components.

[0052] The processor controls the overall operation of the electronic device to complete all or part of the steps in the storage medium sharing method. The memory stores various types of data to support the operation of the electronic device. This data may include, for example, instructions for any application or method operating on the electronic device, and application-related data such as contact data, sent and received messages, pictures, audio, video, etc. The memory can be implemented using any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. Multimedia components may include a screen and audio components. The screen may be, for example, a touchscreen, and the audio components are used to output and / or input audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in memory or transmitted via a communication component. The audio component also includes at least one speaker for outputting audio signals. The I / O interface provides an interface between the processor and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual or physical. The communication component is used for wired or wireless communication between the electronic device and other devices. Wireless communication includes Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination thereof; therefore, the corresponding communication component may include a Wi-Fi module, a Bluetooth module, or an NFC module.

[0053] In an exemplary embodiment, the electronic device may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing a storage medium sharing method.

[0054] Example 4 The present invention provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements a GPU memory virtualization method based on deep parsing of the ioctl protocol.

[0055] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0056] This application is described with reference to flowchart illustrations and / or block diagrams of methods, systems, and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A system that specifies functions in one or more boxes.

[0057] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0058] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0059] The inventive points protected by this invention are as follows: This invention proposes a GPU memory virtualization method based on deep analysis of the ioctl protocol. Its core innovations are summarized as follows: I. A GPU memory isolation method based on user-space driver protocol interception: This method uses the dynamic linking mechanism to intercept application calls to the operating system's ioctl interface in user space. By parsing the proprietary binary communication protocol of the NVIDIA driver, it achieves auditing and control of GPU memory resources without modifying the operating system kernel.

[0060] Second, the heuristic parameter parsing technology based on memory characteristics does not rely on the structure definition of a specific driver version. Instead, it traverses the protocol parameter memory block, identifies fields that conform to the characteristics and numerical range of video memory page alignment, and uses a maximum value greedy matching strategy to extract the physical video memory allocation size, so as to achieve adaptive compatibility with multiple driver versions.

[0061] Third, the system focuses on the full lifecycle management and intelligent deduplication mechanism of shadow resource objects. It reconstructs the creation and destruction view of video memory objects in user mode, and when intercepting video memory mapping instructions, it determines whether the resource has been billed in the allocation stage by querying the historical status of the handle, thereby automatically filtering duplicate video memory usage counts and supplementing billing for handles that are not recorded.

[0062] Fourth, it applies transparent view virtualization and quota access technology, and by hooking the video memory query interface to tamper with the returned total video memory and remaining data, it forces the upper layer application to automatically adapt to the limited video memory space; at the same time, it compares the real-time ledger with the set quota at the underlying protocol layer, blocks excessive physical video memory allocation requests and returns an out-of-memory error.

[0063] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. It will be apparent to those skilled in the art that the invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered illustrative and non-limiting in all respects, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the scope of the invention. No reference numerals in the claims should be construed as limiting the scope of the claims.

[0064] Furthermore, it should be understood that although this specification describes embodiments, not every embodiment contains only one independent technical solution. This narrative style is merely for clarity. Those skilled in the art should consider the specification as a whole, and the technical solutions in each embodiment can be appropriately combined to form other embodiments that can be understood by those skilled in the art. The above content is only for illustrating the technical concept of the present invention and should not be construed as limiting the scope of protection of the present invention. Any modifications made based on the technical concept proposed in this invention shall fall within the scope of protection of the claims of this invention.

Claims

1. A GPU memory virtualization method based on deep parsing of an ioctl protocol, characterized in that, include: Intercept the ioctl system call requests initiated by the target container to obtain the ioctl system call requests used for GPU memory management; Based on the intercepted ioctl system call request, its parameter block is parsed and memory-related instructions are extracted to obtain memory-related instructions for subsequent auditing. Based on the extracted memory-related instructions, a shadow object tree is constructed and intelligent auditing of memory allocation and mapping operations is performed to obtain an accurate memory usage ledger. Based on the accurate memory usage ledger obtained from the audit, the physical access control phase determines whether the quota has been exceeded and executes interception to obtain the control result of whether instruction pass-through is allowed. Based on the control results, view virtualization is implemented and virtualized memory information is returned to the application in the target container to achieve transparent application adaptation, thereby completing precise virtualization isolation and multi-tenant secure sharing of GPU memory.

2. The GPU memory virtualization method based on deep parsing of the ioctl protocol according to claim 1, characterized in that, The interception of the ioctl system call request initiated by the target container specifically includes: By leveraging the Linux dynamic linking preloading mechanism, the ioctl function of the C standard library can be taken over in user space; Based on the device magic number and command code from the NVIDIA driver, the memory allocation instructions, address mapping instructions, heap management instructions, and deallocation instructions are filtered out from all ioctl requests.

3. The GPU memory virtualization method based on deep parsing of the ioctl protocol according to claim 1, characterized in that, The specific steps of parsing its parameter block and extracting video memory-related instructions include: Iterate through the memory regions pointed to by the ioctl parameter pointers and identify fields that conform to the page alignment characteristics of 4KB, 64KB, or 2MB and have a reasonable numerical range. A greedy maximum value matching strategy is used to select the maximum reasonable value as the physical memory allocation size for this application. Simultaneously, the resource type identifier is parsed, and subsequent isolation logic is executed for video memory resources.

4. The GPU memory virtualization method based on deep parsing of the ioctl protocol according to claim 1, characterized in that, The construction of the shadow object tree and the intelligent auditing of video memory allocation and mapping operations specifically include: maintaining a global shadow object tree in user space, recording handles, video memory size, resource types and ownership relationships; Record the initial billing for the handle that is successfully returned by the physical allocation instruction; For subsequent mapping instructions, first query the shadow tree: if the handle already exists, mark it as deduplicated; if it is an unknown handle, perform supplementary billing. This enables strict synchronization between the ledger and physical consumption in scenarios where allocation and mapping are decoupled.

5. The GPU memory virtualization method based on deep parsing of the ioctl protocol according to claim 1, characterized in that, The process of determining whether the quota has been exceeded and executing an interception during the physical access control phase specifically includes: Before each instruction that may cause memory usage is sent to the actual driver, calculate the sum of the current cumulative usage and the current request amount; If the video memory quota set for the container is exceeded, the ioctl request is intercepted and the application is returned with either CUDA_ERROR_OUT_OF_MEMORY or ENOMM error code.

6. The GPU memory virtualization method based on deep parsing of the ioctl protocol according to claim 1, characterized in that, The specific steps of implementing view virtualization based on the control result and returning virtualized video memory information to the application include: Hook the cuDeviceTotalMem, cuMemGetInfo, or cudaGetDeviceProperties video memory query interface; Returns a false total video memory capacity to the application, which is the user-defined quota value and the remaining video memory capacity, which is the quota minus the current shadow tree usage. This forces the cache allocators of the PyTorch and TensorFlow frameworks to automatically converge to the quota range, achieving transparent adaptation without code modification.

7. A GPU memory virtualization device based on deep parsing of the ioctl protocol, characterized in that, include: The ioctl system call request acquisition unit intercepts the ioctl system call requests initiated by the target container and obtains the ioctl system call requests used for GPU memory management. The video memory related instruction parsing unit, based on the intercepted ioctl system call request, parses its parameter block and extracts video memory related instructions to obtain video memory related instructions for subsequent auditing; The accurate video memory usage ledger acquisition unit constructs a shadow object tree and performs intelligent auditing of video memory allocation and mapping operations based on the extracted video memory-related instructions to obtain an accurate video memory usage ledger. The control result acquisition unit, based on the accurate video memory usage ledger obtained from the audit, determines whether the quota has been exceeded and performs interception during the physical access control stage, and obtains the control result of whether instruction pass-through is allowed. The implementation unit, based on the control results, performs view virtualization and returns the virtualized video memory information to the application in the target container to achieve transparent application adaptation, thereby completing precise virtualization isolation and multi-tenant secure sharing of GPU video memory.

8. A GPU memory virtualization device based on deep parsing of the ioctl protocol according to claim 7, characterized in that, In the ioctl system call request acquisition unit, the interception of the ioctl system call request initiated by the target container specifically includes: By leveraging the Linux dynamic linking preloading mechanism, the ioctl function of the C standard library can be taken over in user space; Based on the device magic number and command code from the NVIDIA driver, the memory allocation instructions, address mapping instructions, heap management instructions, and deallocation instructions are filtered out from all ioctl requests.

9. An electronic device, characterized in that, It includes a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to implement a GPU memory virtualization method based on deep parsing of the ioctl protocol as described in any one of claims 1 to 6.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements a GPU memory virtualization method based on deep parsing of the ioctl protocol as described in any one of claims 1 to 6.