A cet-aware adaptive extended page table (ept) virtual machine introspection monitoring method

By building page-level semantic tags and multi-level monitoring state machines in the virtualization platform, and combining CET clues for adaptive monitoring, the problems of insufficient monitoring overhead and security coverage under high load scenarios are solved, and the interpretability of monitoring intensity and system stability are achieved.

CN122240240APending Publication Date: 2026-06-19CHONGQING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHONGQING UNIV OF POSTS & TELECOMM
Filing Date
2026-03-16
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Under high-load scenarios, existing technologies for virtual machine introspection monitoring solutions based on EPT face problems such as excessive monitoring overhead and insufficient security coverage, making it difficult to simultaneously reduce VM-exit and TLB refresh overhead while ensuring long-term control of critical areas.

Method used

By constructing page-level semantic tags and combining them with CET clues for adaptive monitoring, and by using a multi-level monitoring state machine and EPT permission configuration to dynamically adjust the monitoring intensity, and by introducing coverage lower limit and resource consumption upper limit constraints, the system overhead of frequent permission flipping is reduced.

Benefits of technology

It achieves interpretability and adjustability of monitoring intensity under high load scenarios, ensures long-term control of critical areas, reduces system overhead and improves stability, and is compatible with existing virtualization platforms.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240240A_ABST
    Figure CN122240240A_ABST
Patent Text Reader

Abstract

This invention discloses a CET-aware adaptive extended page table (EPT) virtual machine introspection monitoring method, applied to a virtualization platform. On the host side, a semantic snapshot of the target process is constructed and page-level semantic tags are generated. On the virtual machine monitor side, when an EPT violation or single-step trap is triggered, write behavior and CET clue observations are updated, page-level value scores are calculated, and the monitored page is driven to migrate between relaxed, protected, amplified, and blocked states. Based on this, EPT permissions and altp2m views are configured. Simultaneously, constraints are applied to the critical area coverage lower limit and amplified duty cycle upper limit, and a delayed merge commit and TLB refresh merge strategy are adopted to reduce monitoring overhead while ensuring long-term control of critical areas.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of virtualization security and virtual machine introspection (VMI / HVMI) technology, and in particular to a virtual machine introspection monitoring method for virtualization platforms that support Control-flow Enforcement Technology (CET). This method utilizes Extended Page Tables (EPT) permission control and alternate physical memory mapping (alternate p2m, altp2m) capabilities to achieve adaptive adjustment of page-level monitoring intensity and meet critical area coverage constraints. Background Technology

[0002] Virtualization platforms are widely used in cloud computing and data center environments, where critical business processes typically run within guest virtual machines (VMs). To enable security auditing, intrusion detection, or anomaly analysis without modifying the guest operating system and applications, Virtual Machine Introspection (VMI / HVMI) technology observes and analyzes the memory and execution behavior of guest VMs on the hypervisor side, offering advantages such as strong isolation and high resilience.

[0003] In x86 hardware virtualization architectures, Extended Page Tables (EPTs) provide two-level address translation from guest physical addresses to host physical addresses and support read / write / execute permission control on page table entries. Therefore, EPT-based page-level write protection monitoring has become a common implementation method for agentless HVMIs: write protection is set for the target page, and when a guest attempts to write to that page, an EPT violation is triggered, causing the virtual machine to exit (VM-exit). This is then captured and audited, alerted, or handled by the hypervisor.

[0004] However, existing technologies often face contradictions under high-load scenarios:

[0005] 1) If write protection is set for a large number of pages for a long time to improve monitoring coverage, EPT violations will occur frequently, VM-exit / VM-entry will increase, and cross-core TLB refresh may be triggered, resulting in significant performance overhead and tail latency jitter.

[0006] 2) If monitoring intensity is relaxed in order to reduce overhead, critical areas may be exposed for a long time, making it more difficult to detect or verify attacks such as control flow hijacking and self-modifying code in a timely manner.

[0007] Meanwhile, Intel CET provides control flow protection capabilities such as Shadow Stack (SHSTK) and Indirect Branch Tracking (IBT), with IBT using the ENDBR instruction as a constraint marker for indirect branch entry points. Shadow Stack pages and neighboring code pages containing ENDBR have higher security value in control flow hijacking attacks. If differentiated monitoring of different pages can be implemented on the Hypervisor side using CET clues, and if dynamic adjustment of monitoring intensity can be achieved through altp2m view switching and EPT permission programming, while introducing lower limits for critical area coverage and upper limits for resource consumption during amplification phases, monitoring overhead can be significantly reduced while ensuring long-term control of critical areas.

[0008] Therefore, a CET-aware adaptive EPT virtual machine introspection monitoring scheme is needed to solve the problems of "excessive monitoring overhead and insufficient security coverage" in existing technologies under high-load scenarios. Summary of the Invention

[0009] This invention aims to solve the problem that EPT-based static write protection monitoring cannot simultaneously meet the following objectives under high-load scenarios:

[0010] While reducing the overhead of monitoring-related VM-exit, permission inversion and TLB refresh, it still ensures that key areas such as shadow stack pages, critical code pages and ENDBR neighbor pages are under long-term control.

[0011] When suspicious behavior is detected, it can switch to stronger monitoring (amplified verification) or implement blocking and blocking in a short time, while setting a controllable upper limit for resource consumption during the amplification phase.

[0012] To achieve the above objectives, this invention provides a CET-aware adaptive extended page table (EPT) virtual machine introspection monitoring method, applicable to a virtualization platform containing guest virtual machines, a hypervisor, and a host control domain (e.g., Dom0) or a host-side policy daemon. The method includes:

[0013] S1 Semantic Snapshot and Page-Level Classification (Slow Path): A semantic snapshot is constructed based on the target process binary file, runtime address mapping information, and CET configuration information. Page-level semantic tags are generated for the target process-related memory at the page granularity, and a set of key pages is determined. The set of key pages includes at least the target process-related code pages, shadow stack pages, and ENDBR neighboring pages. The mapping from page identifiers to semantic tags is synchronized to the Hypervisor-side cache structure.

[0014] S2 Event Capture and Hot Path Update: On the Hypervisor side, the EPT violation handling path and event callback path are attached. When a write protection violation or single-step trap of a monitored page triggers a VM-exit event, the semantic tag is queried according to the page identifier associated with the triggering event, and the runtime observations of that page are updated.

[0015] S3 Page-Level Value Scoring: Maintains a page-level value score that evolves over time for each monitored page. This is to depict the security value and risk level of the page at the current moment.

[0016] S4 Multi-level Monitoring State Machine: Based on page-level value scoring, a multi-level monitoring state machine is driven to perform state transitions; the state machine includes at least a relaxed state L0, a guarded state L1, an amplified state L2, and a blocked state B, and supports limiting the minimum monitoring intensity of high-value critical pages to no less than L1.

[0017] S5 View Selection and EPT Permission Configuration: Under different monitoring states, different EPT permission combinations and / or different altp2m views can be selected for the same monitored page. "View selection + permission configuration" is used as a unified abstraction of monitoring actions, thereby achieving relaxed operation, protective monitoring, amplified verification and blocking.

[0018] S6 Intrinsic Security Constraints: Introduces a lower limit constraint on coverage of critical areas and an upper limit constraint on duty cycle, and incorporates them into scheduling decisions; when a state transition will cause the coverage to fall below the lower limit or the duty cycle to exceed the upper limit, the monitoring resources are rejected, delayed, or reallocated to meet the constraints.

[0019] S7 update overhead control: For EPT permission updates or altp2m view switching, a delayed and merged batch commit strategy is adopted, and cross-core TLB refresh requests are merged to reduce the system overhead caused by frequent permission flips.

[0020] S8 Closed-Loop Tuning: Telemetry data and alarm results from the monitoring side are sent back to the slow path policy daemon to update parameters such as scoring weights, state machine thresholds, coverage lower limit, and duty cycle upper limit, forming a closed-loop control. This invention also provides an electronic device, including a processor and a memory, wherein the processor executes a program stored in the memory to implement the above method; this invention also provides a computer-readable storage medium storing a program, which, when executed by a processor, implements the above method.

[0021] Beneficial results of the present invention:

[0022] Compared with the prior art, the present invention has at least the following beneficial effects:

[0023] 1) By constructing page-level value scores through page-level semantic tags, write behavior, and CET clues, the monitoring intensity can be explained, adjusted, and adapted.

[0024] 2) By using coverage lower bound constraints, we ensure that shadow stack pages and critical code pages are under long-term control, avoiding sacrificing critical security coverage in pursuit of performance;

[0025] 3) By limiting the upper limit of duty cycle and the budget allocation mechanism, the resource consumption during the amplification phase is restricted, so that the expenditure has a controllable upper limit;

[0026] 4) By using the delayed merge commit and TLB refresh merge mechanism of EPT / view update, the system overhead caused by frequent permission flips is reduced, and the throughput and tail latency stability are improved.

[0027] 5) No need to modify the guest operating system and application binary, strong compatibility, and can be deployed on existing virtualization platforms. Attached Figure Description

[0028] Figure 1 : A schematic diagram of the overall structure of a CET-aware adaptive EPT virtual machine introspection monitoring system of the present invention; Figure 2 This invention presents a flowchart illustrating a CET-aware adaptive EPT virtual machine introspection monitoring method. Detailed Implementation

[0029] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0030] (I) Overall System Structure

[0031] This embodiment includes:

[0032] 1) Guest Virtual Machine: Runs the guest operating system and business processes, including the target process and its related memory areas;

[0033] 2) Lightweight monitoring component on the Hypervisor side: It connects to the EPT fault handling and event callback path to realize constant-level statistical updates of hot paths, score calculation, state machine decision-making, and execute EPT permission configuration and altp2m view switching;

[0034] 3) Host-side policy daemon (e.g., Dom0 side): used for slow path semantic parsing and page classification, parameter maintenance (weight / threshold / window / budget), global constraint statistics (coverage / duty cycle) and policy distribution;

[0035] 4) Control channel: used for semantic tag synchronization, telemetry feedback and parameter distribution.

[0036] (ii) Semantic snapshots and page-level classification

[0037] The policy daemon collects the target process's binary and runtime mapping information, as well as CET configuration, to generate page-level semantic tags. These semantic tags include at least: read-only code pages, regular code pages, regular data pages, heap pages, stack pages, shadow stack pages, ENDBR neighbor pages, and JIT-related read / write pages. Furthermore, based on the semantic tags, key sets are generated, such as key code page sets and shadow stack page sets, and synchronized to the Hypervisor-side cache structure. To reduce hot path overhead, semantic resolution and updates are performed in slow path batch processing; updates are triggered when a module is loaded, mappings change, or CET configuration changes.

[0038] (III) Event Capture and Statistical Updates

[0039] Hypervisor-side monitoring component captures:

[0040] • EPT violation: When a page is write-protected / execute-restricted, access triggers an EPT fault, leading to VM-exit;

[0041] • Single-step trap: Captures finer-grained behavior when the page is in L2 zoom-in state and single-step is enabled;

[0042] • Event Callback: Reports events to the policy daemon using vm_event or an equivalent mechanism. Hot paths are identified by page-based query semantic tags, and write behavior statistics and CET lead statistics are incrementally updated to provide data for score calculation.

[0043] (iv) Page-level value scoring and state machine

[0044] Hot paths are based on page-level semantic priors. Write behavior volume CET clue volume and prior augmentation Calculate page-level value score The score can be expressed in a linearly separable multi-factor combination form as follows: ,in , , , Non-negative weights and .

[0045] in, Used to characterize the static semantic importance of page p; Used to characterize the number of writes, the number of bytes written, or the write span of page p near time t; Used to characterize shadow stack anomaly writes, ENDBR neighborhood rewrites, or other high-confidence clues related to CET; Used to characterize prior enhancements given by historical alerts, policy blacklists, or preceding evidence chains.

[0046] The scoring-driven state machine transitions between L0, L1, L2, and B.

[0047] L0: Permissive operation; L1: Guardian monitoring (e.g., write protection auditing); L2: Enhanced verification (stricter permissions and / or single-step); B: Blocking (prohibits write and / or execution and issues an alert).

[0048] For high-value pages such as shadow stack pages, critical code pages, and ENDBR neighbor pages, the minimum state is limited to no less than L1, and a dual threshold and minimum residence time mechanism is used to suppress state oscillations.

[0049] (v) View selection and EPT permission configuration (corresponding to step S5)

[0050] Use altp2m to build at least a "Running View" and a "Monitoring / Blocking View", and select different permission combinations for different states:

[0051] L0: Run view, with relatively relaxed permissions;

[0052] L1: Enable write protection / auditing for the target page on the run view;

[0053] L2: Switch to monitoring view, enable stricter write protection and can be used with single-step traps;

[0054] B: Switch to the blocking view, prohibiting writing and / or execution. Through this mapping, the state machine outputs directly executable "view switching + permission programming" actions.

[0055] (vi) Coverage and duty cycle constraints

[0056] This embodiment applies a lower limit constraint on coverage for key areas.

[0057] Define the controlled state set as {L1, L2, B}; within the statistical window W, determine the key area coverage rate based on the percentage of pages in the key page set that are in a controlled state, the percentage of controlled dwell time, or a combination thereof. .

[0058] Where P represents the set of key pages, and W represents the statistics window. The value of is used to characterize the degree to which set P remains under control within window W.

[0059] At the same time, an upper limit constraint on the duty cycle is imposed on the amplification stage.

[0060] Within the statistics window W, the magnification duty cycle is determined based on the dwell time of the monitored page in L2 magnification state, the number of pages in L2 magnification state, or a combination thereof. and make Not higher than the preset upper limit If a state transition would cause the coverage to fall below the lower limit or the duty cycle to exceed the upper limit, the scheduling module would reject the transition, postpone its execution, or reallocate the budget. The host-side policy daemon maintains the available L2 residency budget in each control window and prioritizes allocating it to pages with higher scores to ensure that both security coverage and resource upper limits are met simultaneously.

[0061] (vii) Delayed merge submission

[0062] To reduce the overhead of TLB refresh and VM-exit caused by frequent permission inversions, EPT / view update requests are written to a pending queue and submitted in a centralized manner at preset batch processing points. Cross-core TLB refresh requests are merged, so that the cost of multiple updates is amortized, thereby improving performance stability under high load scenarios.

[0063] (viii) Closed-loop optimization and alarm output

[0064] The policy daemon periodically summarizes coverage, duty cycle, EPT violation count, TLB refresh count, tail delay telemetry, and alarm information, and updates weight coefficients, threshold parameters, window and budget parameters, coverage lower limit, and duty cycle upper limit, etc., and sends them to the Hypervisor side through the control channel. Alarms include at least shadow stack abnormal writes, ENDBR neighborhood rewriting, and suspicious self-modifying code; blocking and reporting are triggered when entering the blocked state B.

Claims

1. A CET-aware adaptive extended page table (EPT) virtual machine introspection monitoring method, characterized in that, This method, applied to a virtualization platform containing guest virtual machines, a virtual machine monitor, and a host-side policy daemon, includes the following steps: S1 On the slow path side, a semantic snapshot is constructed based on the target process's binary file, runtime mapping information, and CET configuration information. The guest address space is divided into page-level semantic tags, and the page-level semantic tags are synchronized to the cache structure on the virtual machine monitor side. S2 On the hot path side, when an EPT violation or a single-step trap triggers a virtual machine exit event, the page-level semantic tags are queried based on the guest physical page identifier associated with the triggering event, and the runtime observations of the page are updated. S3 Based on the page-level semantic tags and the runtime observations, a page-level value score is calculated for the page as it evolves over time, used to characterize the page's security value and risk level at the current moment. S4 Based on the page-level value score, a multi-level monitoring state machine is maintained for each monitored page. The state machine is used to drive the upgrade, downgrade, and blocking decisions of the page's monitoring strength. S5 Under different monitoring states, different EPT permission combinations and different altp2m views are selected for the same guest physical page. View selection and permission configuration are treated as a unified abstraction of monitoring actions to achieve relaxed operation, protective monitoring, amplified verification, and blocking. S6 Introduces a lower limit constraint on critical area coverage and an upper limit constraint on amplified duty cycle at the global level, and incorporates these constraints into the scheduling decision to avoid sacrificing long-term control of critical areas and limiting resource consumption during the amplification phase during performance optimization. S7 When it is necessary to adjust EPT permissions or altp2m views, a batch processing submission strategy of delay and merging is adopted for EPT modification operations, and cross-core TLB refresh requests are merged to amortize the update cost of a single event. S8 Telemetry data during the monitoring process is sent back to the slow path side for policy parameter management and optimization, forming a closed-loop control.

2. The method according to claim 1, characterized in that, The page-level semantic tags include at least: read-only code pages, shadow stack pages and their management metadata pages, neighboring code pages containing ENDBR instructions, read-write pages related to JIT, and ordinary data pages.

3. The method according to claim 1, characterized in that, The semantic snapshot is updated when any of the following conditions are met: the target process loads a new module, a large-scale memory mapping change occurs, or CET-related configurations are changed; the update is completed by batch processing on the slow path side, while the hot path only executes lightweight queries based on page identifiers.

4. The method according to claim 1, characterized in that, Page-level value scoring Using a linearly separable multi-factor combination, the following conditions are met: ,in, This represents the static semantic prior value of page p. This represents the amount of write activity on page p at time t. This represents the CET thread count of page p at time t. This represents the prior augmentation amount of page p. , , , All are non-negative weights and .

5. The method according to claim 4, characterized in that, The amount of write behavior Within a preset sliding time window, the number of writes, the number of bytes written, or the write span, as well as the source thread or context of the write, are counted to distinguish between short burst writes and continuous high-frequency writes.

6. The method according to claim 1, characterized in that, The multi-level monitoring state machine includes four stable states: L0 (Relaxed), L1 (Guardian), L2 (Amplified), and B (Blocked). L0 corresponds to relaxed permissions in the running view; L1 corresponds to overlaying local write protection or audit points on the running view; L2 corresponds to switching to a monitoring view with write protection and single-step execution enabled; and B corresponds to a blocking view that prohibits writing and / or execution. The state machine employs a dual-threshold and minimum dwell time mechanism. When the score exceeds the upper threshold, it upgrades from L0 or L1 to L2 or enters B. When the score falls below the lower threshold and remains stable within the observation window, it downgrades from L2 or L1. For shadow stack pages or ENDBR neighboring code pages, the minimum state is limited to no less than L1.

7. The method according to claim 6, characterized in that, Define the controlled state sets as L1, L2 and B. Within the statistical window W, determine the key area coverage rate according to the percentage of pages in the key page set that are in the controlled state, the percentage of controlled dwell time, or a combination thereof, and ensure that the key area coverage rate is not lower than a preset lower limit.

8. The method according to claim 6, characterized in that, Within the statistics window W, the magnification duty cycle is determined based on the dwell time of the monitored page in L2 magnification state, the number of pages in L2 magnification state, or a combination thereof. and make Not higher than the preset upper limit The side-policy daemon maintains the available L2 residency budget within each control window and prioritizes allocating the budget to pages with higher ratings.

9. A CET-aware adaptive EPT virtual machine introspection monitoring system for implementing the method of any one of claims 1 to 8, characterized in that, include: The semantic snapshot and page classification module is used to build and update page-level semantic tags on the slow path side; The event capture and hot path decision module is used to capture virtual machine exit events triggered by EPT violations or single-step traps on the virtual machine monitor side and update the observations. The page-level scoring module is used to calculate the page-level value score based on page-level semantic tags, write behavior volume, CET clue volume, and prior enhancement volume. The monitoring state machine and scheduling module is used to drive the L0, L1, L2, and B state transitions based on page-level value scores, and to execute the lower limit constraint of critical area coverage and the upper limit constraint of amplified duty cycle; the EPT and altp2m programming submission module is used to execute altp2m view selection, EPT permission configuration, and delayed merge submission. The telemetry feedback and strategy tuning module is used to feed back monitoring data and update strategy parameters.

10. An electronic device comprising a processor and a memory, the memory storing a computer program that, when executed by the processor, implements the method of any one of claims 1 to 8.

11. A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method of any one of claims 1 to 8.