Non-uniform memory access data multi-copy method and device based on jump label

By inserting static probe points with jump tags into the Linux kernel and using a multi-dimensional hotspot scoring algorithm, local replicas of NUMA nodes are dynamically created, solving the latency and resource waste problems of cross-node access under the NUMA architecture, and achieving real-time optimization and transparent performance improvement.

CN122240527APending Publication Date: 2026-06-19KYLIN CORP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
KYLIN CORP
Filing Date
2026-03-25
Publication Date
2026-06-19

Smart Images

  • Figure CN122240527A_ABST
    Figure CN122240527A_ABST
Patent Text Reader

Abstract

This invention discloses a method and apparatus for multiple copies of non-uniform memory access data based on jump tags, relating to the field of remote data access optimization technology. The method includes: inserting static probe points based on jump tags into the critical path of the Linux kernel, initially in a disabled state; collecting remote memory page access data using a background monitoring thread, and identifying hot pages using a multi-dimensional hotspot scoring algorithm; dynamically enabling the corresponding jump tags in response to the identification results; and triggering the jump tag to perform a local copy creation operation when the process accesses the hot page again, creating a copy on the target NUMA node and updating the page table mapping. By using the default disabled static probe points, zero overhead is achieved in the code execution path. By dynamically enabling jump tags, the latency from decision to replication is shortened to the next memory access, achieving instant optimization response, transparency to the application, and significantly reducing system access latency and resource overhead.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of remote data access optimization technology, and in particular to a method and apparatus for non-uniform memory access to multiple copies of data based on jump tags. Background Technology

[0002] In modern high-performance computing and large-scale server systems, Non-Uniform Memory Access (NUMA) architecture has become mainstream. In a NUMA architecture, each NUMA node has its own dedicated local memory. When a program or process running on a NUMA node needs to access data on other NUMA nodes, additional latency and performance overhead are inevitable. Data replication effectively solves this problem by copying memory pages from remote nodes to the local node, eliminating the need for programs or processes to access data across nodes and successfully avoiding the performance latency issues caused by cross-node access in NUMA architectures.

[0003] However, in earlier versions of the Linux kernel, some function calls and conditional checks incurred performance overhead. This was especially true when using data replication for remote data access and cross-node data copying, where frequent function calls, conditional checks, and remote access increased business response latency and wasted significant CPU resources. Summary of the Invention

[0004] This invention provides a method and apparatus for non-uniform memory access data with multiple copies based on jump tags, in order to solve the technical problem of resource waste caused by multiple data copies when accessing remotely across NUMA nodes.

[0005] In a first aspect, embodiments of the present invention provide a method for multiple copies of non-uniform memory access data based on jump tags, including: S101 inserts static probe points based on jump tags into the Linux kernel; S102: The background monitoring thread continuously collects memory page access data of remote NUMA nodes and uses a preset multi-dimensional hot spot scoring algorithm to calculate hot spot scores. When the hot spot score exceeds the preset hot spot threshold, the corresponding memory page is identified as a hot memory page and a local page replica creation operation is initiated. S103, in response to the local creation operation of the page copy, the jump tag corresponding to the hot memory page is enabled through an atomic operation, so that the static detection point based on the jump tag is switched from the disabled state to the enabled state, and the jump tag key value corresponding to the hot page is dynamically allocated. S104, when a process accesses the hot memory page again, the static probe point based on the jump tag is triggered. At this time, the preset local copy creation operation is performed using the jump tag to create a local copy of the hot memory page on the target NUMA node and update the page table mapping relationship of the process.

[0006] Secondly, embodiments of the present invention provide a non-uniform memory access data multiple copy device based on jump tags, comprising: The probe configuration module is used to insert static probes based on jump tags into the Linux kernel; The hot page evaluation module is used to monitor the access data of memory pages of remote NUMA nodes and perform multi-dimensional hot page score calculation. When the hot page score exceeds the preset hot page threshold, the corresponding memory page is identified as a hot memory page and a local page copy creation operation is initiated. The jump tag configuration module is used to respond to the local creation operation of the page copy, enable the jump tag corresponding to the hot memory page through atomic operations, and dynamically allocate the jump tag key value corresponding to the hot page; The local copy creation module is used to perform a local copy creation operation by using a jump label when a process accesses a hot memory page again. It creates a local copy of the hot memory page on the target NUMA node and updates the page table mapping relationship of the process.

[0007] Thirdly, embodiments of the present invention provide an electronic device, including: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors implement the above-described method for multiple copies of non-uniform memory access data based on jump tags.

[0008] Fourthly, embodiments of the present invention provide a storage medium containing computer-executable instructions, which, when executed by a computer processor, are used to perform the aforementioned method for multiple copies of non-uniform memory access data based on jump tags.

[0009] This invention provides a method and apparatus for multiple replicas of non-uniform memory access data based on jump tags. The method inserts static probe points based on jump tags into the Linux kernel and initializes them to a disabled state to avoid performance overhead under the default path. Then, it monitors memory page access data of remote NUMA nodes, performs multi-dimensional hotspot scoring to identify hot memory pages. In response to the hotspot identification result, it dynamically enables the corresponding jump tag through atomic operations, switching the static probe point to an enabled state. Subsequently, when a process accesses a hot memory page again, it triggers the enabled jump tag and executes a preset local replica creation operation, creating a local replica on the target NUMA node and updating the page table mapping. By using static probe points that are disabled by default, no branch judgment overhead is incurred when the code executes the default path, achieving zero interference with system performance. By dynamically enabling jump tags, the latency from decision to replication trigger is shortened to the next memory access, avoiding the latency of waiting for periodic sampling, achieving near-instantaneous optimization response, completely transparent to upper-layer applications, requiring no modification to application code or user intervention, and significantly reducing overall system access latency and resource overhead. Attached Figure Description

[0010] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an undue limitation of the invention. In the drawings: Figure 1 This is a flowchart of a non-uniform memory access data multiple copy method based on jump tags according to Embodiment 1 of the present invention; Figure 2 This is a flowchart of a non-uniform memory access data multiple copy method based on jump tags, as described in Embodiment 2 of the present invention; Figure 3 This is a schematic diagram of a non-uniform memory access data multiple copy device based on jump tags according to Embodiment 3 of the present invention; Figure 4 This is a structural diagram of the electronic device described in Embodiment 4 of the present invention. Detailed Implementation

[0011] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, the accompanying drawings show only the parts relevant to the present invention, and not all of the structures.

[0012] Example 1 Figure 1The flowchart below illustrates a non-uniform memory access data multi-copy method based on jump tags, as described in Embodiment 1 of this invention. By embedding static probe points, it dynamically monitors the popularity of memory pages and triggers the local creation of page copies, dynamically allocates jump tag key values, and dynamically reclaims idle old copies, thereby achieving dynamic multi-copy of memory pages. Specifically, it includes the following steps: S101 inserts static probe points based on jump tags into the Linux kernel.

[0013] During the Linux kernel compilation phase, static probes based on jump labels are inserted into critical memory access paths. Jump labels are a branching technique with relatively low resource overhead. When memory pages are marked as hotspots due to frequent access, they can be used to enable copy paths, performing memory page copy operations to create local copies on the target node where the process resides. Especially when inserting static probes at the entry points of kernel hot paths such as page faults and task scheduling migrations, and disabling jump labels during initialization, the jump labels can leverage Linux kernel features to ensure that, under default conditions, the CPU executes code along the default path exactly the same as the native path without inserted probes, theoretically achieving zero performance overhead.

[0014] S102 continuously collects memory page access data of remote NUMA nodes using a background monitoring thread, and calculates hotspot scores using a preset multi-dimensional hotspot scoring algorithm. When the hotspot score exceeds the preset hotspot threshold, the corresponding memory page is identified as a hotspot memory page, and a local page replica creation operation is initiated.

[0015] Utilizing an existing kernel background monitoring system, memory page access data from remote NUMA nodes is continuously collected. For example, the kernel's monitoring and decision-making module can sample data using a lightweight performance monitoring counter (PMC), record page fault events or task scheduler events, and collect access records for each memory page. For each memory page, a pre-defined multi-dimensional hotspot scoring algorithm is used to calculate its hotspot score. This algorithm comprehensively considers the remote access ratio, access frequency, access intensity, and migration benefits of the memory page, evaluating its hotspot status from multiple dimensions. When the score exceeds a pre-defined hotspot threshold, it indicates that the memory page has a high access volume and can be considered a hotspot. Copying it locally can reduce access latency and resource overhead to some extent, at which point a local page copy creation operation is initiated.

[0016] S103, in response to the local creation operation of the page copy, the jump tag corresponding to the hot memory page is enabled through an atomic operation, so that the static detection point based on the jump tag is switched from the disabled state to the enabled state, and the jump tag key value corresponding to the hot page is dynamically allocated.

[0017] In response to the local page copy creation operation, the kernel first locates the key of the specific jump label (jump_label) associated with the NUMA data copy operation. Then, by calling `static_branch_enable(&numa_copy_needed[key])`, the Linux kernel's atomic code patching mechanism atomically enables the static probe point embedded with the jump label, ensuring that any subsequent CPU execution reaching this probe point will immediately jump to the `handle_numa_copy()` function. Simultaneously, the kernel locates the jump label key value corresponding to the hot memory page. Based on the availability of local jump label key values, it dynamically allocates and maintains the mapping relationship between hot memory pages and jump label key values, including the mapping relationship between the key value and the triplet of the current node's memory management structure (mm_struct), the page identifier (page structure pointer), and the target NUMA node.

[0018] S104, when a process accesses the hot memory page again, the static probe point based on the jump tag is triggered. At this time, the preset local copy creation operation is performed using the jump tag to create a local copy of the hot memory page on the target NUMA node and update the page table mapping relationship of the process.

[0019] When a process accesses the hot memory page again, for example, if a page fault is triggered due to a page miss, the code execution reaches the entry point of the page fault function, triggering the already enabled static probe point. At this point, it immediately jumps to the `handle_numa_copy()` function, which performs the pre-defined local copy creation operation: First, on the target NUMA node (the node where the current process resides), a new physical page frame is allocated for the hot memory page. Then, the hot memory page from the remote node is copied to this newly allocated physical page frame in a non-blocking manner. After the copy is complete, the virtual address mapping in the process's page table is updated to point to the newly created local copy. Afterward, all subsequent accesses to this virtual address by the process will directly hit the local copy of the hot memory page, completely eliminating access latency across NUMA nodes. Similarly, static probe points based on jump tags, leveraging the characteristics of the Linux kernel, generate almost no CPU overhead, achieving the effect of reducing access latency and minimizing resource waste in NUMA multi-node multi-copy scenarios.

[0020] This embodiment inserts a static probe point based on jump tags into the Linux kernel and initializes it to a disabled state to avoid performance overhead under the default path. It then monitors memory page access data from remote NUMA nodes, performing multi-dimensional hotspot scoring to identify hot memory pages. In response to the hotspot identification results, it dynamically enables the corresponding jump tag through atomic operations, switching the static probe point to an enabled state. Subsequently, when a process accesses a hot memory page again, the enabled jump tag is triggered, and a preset local copy creation operation is executed, creating a local copy on the target NUMA node and updating the page table mapping. By using the static probe point, which is disabled by default, no branch judgment overhead is incurred when the code executes the default path, achieving zero interference with system performance. By dynamically enabling the jump tag, the latency from decision to copy trigger is shortened to the next memory access, avoiding the latency of waiting for periodic sampling, achieving a near-instantaneous optimization response. This is completely transparent to upper-layer applications, requiring no modification to application code or user intervention, significantly reducing overall system access latency and resource overhead.

[0021] Optionally, the access status and lifecycle events of the local copy of the hot memory page are monitored in real time. When the preset recycling conditions are met, the jump tag key value corresponding to the hot memory page is disabled, so that the static detection point based on the jump tag is switched from enabled to disabled, and the resources associated with the jump tag key value are released.

[0022] To ensure efficient resource reuse, the entire lifecycle of local copies can be monitored and managed. This involves real-time monitoring of the access status and lifecycle events of local copies of hot memory pages, continuously tracking information such as the last access time, reference count, and fixed state of each local copy. When a local copy meets preset reclamation conditions, a reclamation operation is triggered. First, the jump tag key value corresponding to the hot memory page is disabled, switching the static probe point back to a disabled state. Then, the resources associated with the jump tag key value, including mapping relationships and kernel data structures, are released. Finally, the physical memory page frames occupied by the local copy are released.

[0023] One optional implementation of this embodiment is that the preset recycling conditions include: When the local copy is not in a fixed state, or the page reference count of the local copy is 1, or the local copy has expired, or the number of local copies of the same hot memory page exceeds the preset upper limit threshold for the number of the same page, or the local copy is not in the node where the current process is located, the local copy of the hot memory page is reclaimed.

[0024] When any of the following conditions are met: The local copy is not in a pinned state, such as the REPLICA_PINNED flag not being set; The local copy has a page reference count of 1, meaning it is only being used by the current process. The local copy has expired, which can be determined based on the time decay model, that is, if the local copy has not been accessed after a certain period of time. If the number of local copies of the same hot memory page exceeds the preset upper limit threshold for the number of the same page, and if the same local hot memory page is used as the source page, a maximum of 3 local copies of each node are allowed to exist. If the local copy is located on a node that is not the node where the process currently accessing it resides (e.g., the process has been migrated to another node), then the preset reclamation conditions are met, and the local copy reclamation operation for the hot memory page is triggered.

[0025] By setting a recycling timer, the local replicas are periodically checked to see if they meet the preset recycling conditions. When the node memory pressure is too high, the local replicas are recycled according to the node's own recycling strategy. When creating a new local replica, if the number of local replicas exceeds the preset maximum number threshold, the old local replicas are actively recycled.

[0026] A recycling strategy can be implemented to dynamically and automatically reclaim local replicas. A recycling timer can be set to periodically (e.g., every 5 minutes) scan all local replicas to check if they meet preset recycling conditions. To ensure balanced memory load on nodes, local replicas meeting certain conditions can be reclaimed based on the node's own memory recycling strategy when node memory pressure is too high. To ensure that no duplicate or long-idle replicas continuously occupy resources, when creating new local replicas, if the number of local replicas using the same hot memory page as the source page exceeds a preset maximum threshold (e.g., more than 3), the replica with the oldest timestamp will be actively reclaimed. Through this replica recycling mechanism, unused replicas are automatically released based on factors such as access status, reference count, and expiration time. Triggered by periodic, memory pressure, and active recycling, this effectively avoids wasting memory resources.

[0027] Example 2 Figure 2 This is a flowchart of a non-uniform memory access data multiple copy method based on jump tags according to Embodiment 2 of the present invention. This embodiment is based on the above embodiment and optimized. In this embodiment, S101 is specifically optimized as follows: The static probe points based on jump tags are inserted at the page fault handling function and task scheduling migration function in the Linux kernel, wherein the jump tags include jump tag key values; The static probe point based on the jump label is initially disabled, and will not trigger a replica creation operation when the code executes the default path.

[0028] Accordingly, the non-uniform memory access data multiple copy method based on jump tags provided in this embodiment specifically includes: S201, Insert the static probe point based on the jump label at the page fault handling function and the task scheduling migration function in the Linux kernel, wherein the jump label includes a jump label key value.

[0029] During Linux kernel compilation, static probe points based on jump labels are inserted at the entry points (starting positions) of critical kernel functions, such as the page fault handling function `do_page_fault` and the task scheduling migration function `sched_migrate_task`. When a process accesses a memory page, if the page table entry is missing or requires permission checks, a page fault will occur. This is an ideal location to catch the first access to a remote node page or access to an unmapped page, intervening before the process experiences latency. When a process migrates from one NUMA node to another, the data replica on the original node may no longer be in the optimal position. Inserting static probe points here allows for the preparation of a local copy for the migrated process in advance, enabling migration-related optimizations.

[0030] S202, the static probe point based on the jump label is initially disabled, and the replica creation operation will not be triggered when the code executes the default path.

[0031] Each jump label contains a jump label key, which is used to uniquely identify a specific memory page or optimization operation initialization. At this time, all jump labels are in a disabled state. At this time, all execution code flows follow the default path and will not call the page copy function or trigger the copy creation operation, ensuring that the kernel default path is not affected and no performance overhead is generated.

[0032] S203 continuously collects memory page access data of remote NUMA nodes using a background monitoring thread, and calculates hotspot scores using a preset multi-dimensional hotspot scoring algorithm. When the hotspot score exceeds the preset hotspot threshold, the corresponding memory page is identified as a hotspot memory page, and a local page replica creation operation is initiated.

[0033] Through a kernel background thread or event-triggered mechanism, such as a monitoring and decision-making module, a lightweight monitoring trigger point can be inserted into the NUMA balancing path of the Linux kernel scheduler. The monitoring is driven by page fault events to continuously monitor the status of NUMA nodes. Then, the hot spot score of each memory page is calculated based on a preset multi-dimensional hot spot scoring algorithm using the acquired memory page access data.

[0034] Specifically, the pre-defined multi-dimensional hot topic scoring algorithm includes: The remote access ratio score is calculated based on the proportion of times the memory page is accessed by remote NUMA nodes out of the total number of accesses. The calculation formula is as follows: Remote access ratio = (Number of remote accesses / Total number of accesses) × 100% Remote access ratio score = min(remote access ratio × 40 / 100, 40) Based on the historical access data of the memory pages, a piecewise linear decay model is used to assign weights to historical accesses, with more recent accesses receiving higher weights. An access frequency score is calculated to reflect the time decay characteristics of access popularity. The calculation formula is as follows: A piecewise linear decay model is adopted, with higher weights for more recent visits; Attenuation weight = { Within 1 minute: 100% (age < 60 seconds) Within 1 hour: 100% - (age - 60) × 50 / 3540 (60 seconds ≤ age < 3600 seconds) Within 1 day: 50% - (age - 3600) × 40 / 82800 (3600 seconds ≤ age < 86400 seconds) 1 day later: 10% (age ≥ 86400 seconds) } Weighted access count = Total access count × Decay weight / 100 Access frequency score = min(log2(weighted access count + 1) × 5, 30) Based on the access time span of the preset recent access count of the memory page, an access intensity score is calculated. This score reflects the intensity of burst access by the number of times the memory is accessed per unit time. A circular buffer is used to maintain an 8-element timestamp array for each candidate memory page, recording the timestamps of the 8 most recent accesses. The number of accesses within the most recent 1-second (1,000,000,000 nanoseconds) time window is counted. If there are 8 accesses within 1 second, the score is 8 × 2.5 = 20 points (full marks); if there are 4 accesses within 1 second, the score is 4 × 2.5 = 10 points; if there is 1 access within 1 second, the score is 1 × 2.5 = 2.5 points.

[0035] Based on the difference in access latency between the memory page on the remote NUMA node and the local NUMA node, and the page network migration cost, the migration benefit score is calculated using the following formula: Expected latency reduction = Remote access latency - Local access latency Migration cost = Page size × Reciprocal of network bandwidth Net income = Expected reduction in delay - Migration costs Remote access latency can be estimated using the number of cycles of the PMU event MEM_LOAD_RETIRED.REMOTE_DRAM; local access latency can be estimated using the number of cycles of MEM_LOAD_RETIRED.LOCAL_DRAM; network bandwidth can be obtained from the inter-node bandwidth in the NUMA topology information; page size is 4KB and 16KB for standard pages. A migration benefit score of 0-10 is assigned based on the net benefit range. Net earnings > 100ns: 10 points Net earnings > 50ns: 7 points Net earnings > 20ns: 4 points Net income > 0ns: 2 cents Net income ≤ 0ns: 0 points } The remote access ratio score, access frequency score, access intensity score, and migration benefit score are summed and compared with a preset hotspot scoring rule to determine the hotspot level of the memory page. The hotspot score calculation formula is as follows: Total score = Proportion score + Frequency score + Intensity score + Benefit score Total score range: 0-100 points. When the total score is ≥75, the page is considered a hot page. Specific scoring rules are: HOTPAGE_NONE (non-hot page): score 0-50; HOTPAGE_WARM (Warm Hotspot): Rating 51-74; HOTPAGE_HOT (Hot Topics): Rating 75-87; HOTPAGE_CRITICAL: Rating 88-100.

[0036] S204, in response to the local creation operation of the page copy, the jump tag corresponding to the hot memory page is enabled through an atomic operation, so that the static detection point based on the jump tag is switched from the disabled state to the enabled state, and the jump tag key value corresponding to the hot page is dynamically allocated.

[0037] Specifically, the dynamically assigned jump tag key values ​​corresponding to hot pages include: First, maintain a global jump tag key-value pool, including the maximum number of key-value pairs, the lowest usage frequency linked list, and the idle key-value linked list.

[0038] To address the potential key-value explosion problem in large-scale systems, a global jump label key-value pool is maintained to enable dynamic allocation and reuse of keys: this pool includes the maximum number of keys (max_keys), the least frequently used list (lru_list), and the free list (free_list).

[0039] When allocating jump tag key values ​​corresponding to hot pages, the maximum number of key values ​​used is determined based on whether the current number of key values ​​used has been reached. If not, the key values ​​are allocated from the idle key value list; if the maximum number has been reached, the key values ​​are allocated from the lowest usage frequency list. The page copy corresponding to the least used key value in the lowest usage frequency list is forcibly reclaimed and reused.

[0040] When it is necessary to allocate a jump tag key value for a hot memory page, first check if the current number of used key values ​​has reached max_keys; if not, allocate a new key value from free_list and establish a mapping relationship between the key value and the memory page; if the number has reached, reclaim the least used key value from lru_list, forcibly reclaim the page copy corresponding to the key value to release the key value resource, and then reassign the key value to the current hot page; after that, update lru_list and add the newly allocated key value to the end of the linked list.

[0041] S205, when a process accesses a hot memory page again, the static probe point based on the jump tag is triggered. At this time, the preset local copy creation operation is performed using the jump tag to create a local copy of the hot memory page on the target NUMA node and update the page table mapping relationship of the process.

[0042] Specifically, the local copy creation operation includes: Allocate a new physical page frame on the target NUMA node: Call the kernel memory allocation interface (such as alloc_pages_node) to allocate a new physical page frame on the target NUMA node. The allocation should take into account the node's memory waterline to avoid forced allocation when memory is scarce.

[0043] Initiate a non-blocking DMA copy operation to copy hot memory pages from a remote NUMA node to a newly allocated physical page frame. First, use an asynchronous DMA copy mechanism (such as copy_page_async or asynchronous transfer based on dmaengine) to copy the contents of the hot memory pages on the remote NUMA node to the newly allocated local page frame. First, prepare a DMA descriptor, specifying the source physical address (remote page), destination physical address (local page), and page size; then submit the DMA request and return immediately without waiting for the copy to complete; subsequently, the DMA controller performs the copy operation in the background, triggering an interrupt or callback function upon completion. During the copy, it is also necessary to check if the page is migrateable. If migrateable, immediately allocate a new page on the target node and create an asynchronous migration task; then, increment the page reference count to prevent it from being released. At this time, set a temporary mapping: pointing to the new page, but marked as being copied; then submit the asynchronous copy task to the work queue or DMA, continuously checking the copying status, and clearing the copying status upon completion, thus completing the copy process.

[0044] Atomic update of the corresponding virtual address mapping in the process page table.

[0045] In the callback function after the DMA copy is complete, a page table update operation is performed, acquiring the process's page table lock; the page table entry corresponding to the virtual address is changed from pointing to a remote page frame to pointing to the newly allocated local page frame; the corresponding TLB entry is flushed (using `flush_tlb_page` or automatically flushed upon subsequent access); and the lock is released. The page table update operation must be atomic to ensure that no other CPU core accesses an inconsistent mapping during the update process. If the DMA copy or page table update fails, the newly allocated local page frame is released, the original remote mapping remains unchanged, and the corresponding jump label is re-disabled to prevent subsequent accesses from retrying the failed copy creation operation.

[0046] This embodiment deploys static probes in specific kernel hot paths, such as page fault handling functions and task scheduling migration functions, to ensure that optimization can be intervened at critical moments. It employs a multi-dimensional hotspot scoring algorithm, comprehensively scoring based on four dimensions: remote access ratio, access frequency, access intensity, and migration benefit, and classifying them into multiple hotspot levels. This accurately identifies hot memory pages and avoids invalid migrations. By maintaining a global jump tag key-value pool and utilizing free and LRU lists to achieve dynamic allocation and reclamation of keys, it prevents the number of keys from growing indefinitely, ensuring the controllability and scalability of system resources. In the local copy creation operation, it uses a combination of asynchronous DMA copying and atomic page table updates, and introduces error handling and rollback mechanisms to achieve copy creation with minimal performance interference. The first access experiences only a slight delay, and subsequent accesses all hit the local copy, covering critical scenarios such as initial process access and process migration, further improving the coverage and timeliness of access latency optimization.

[0047] Example 3 Figure 3 This is a schematic diagram of a non-uniform memory access data multiple copy device based on jump tags according to Embodiment 3 of the present invention. In this embodiment, the non-uniform memory access data multiple copy device based on jump tags includes: The probe configuration module 810 is used to insert static probes based on jump tags into the Linux kernel; The hot page evaluation module 820 is used to monitor the access data of memory page access of remote NUMA nodes and perform multi-dimensional hot page score calculation. When the hot page score exceeds the preset hot page threshold, the corresponding memory page is identified as a hot memory page and a local page copy creation operation is initiated. The jump tag configuration module 830 is used to respond to the local creation operation of the page copy, enable the jump tag corresponding to the hot memory page through atomic operations, and dynamically allocate the jump tag key value corresponding to the hot page; The local copy creation module 840 is used to perform a local copy creation operation by using a jump label when a process accesses a hot memory page again. It creates a local copy of the hot memory page on the target NUMA node and updates the page table mapping relationship of the process.

[0048] This embodiment inserts static probe points based on jump tags into the Linux kernel through a probe point configuration module. A hot page evaluation module monitors memory page access and identifies hot memory pages using multi-dimensional hotspot scoring. The jump tag configuration module responds to the local copy creation operation of hot memory pages, dynamically assigning them corresponding jump tag key-value pairs. When a process accesses a hot memory page again, the local copy creation module creates a local copy using the jump tag and updates the page table mapping. By using static probe points, which are disabled by default, no branching overhead occurs when the code executes the default path, achieving zero interference with system performance. By dynamically enabling jump tags, the latency from decision to copy trigger is shortened to the next memory access, avoiding the latency of waiting for periodic sampling, achieving near-instantaneous optimization response. This is completely transparent to upper-layer applications, requiring no modification to application code or user intervention, significantly reducing overall system access latency and resource overhead.

[0049] The non-uniform memory access data multiple copy device based on jump tags provided in the embodiments of the present invention can execute the non-uniform memory access data multiple copy method based on jump tags provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.

[0050] Example 4 Figure 4 This is a structural diagram of an electronic device according to Embodiment 4 of the present invention. Figure 4 A block diagram is shown of an exemplary electronic device 12 suitable for implementing embodiments of the present invention. Figure 4 The electronic device 12 shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present invention.

[0051] like Figure 4 As shown, the electronic device 12 is represented in the form of a general-purpose computing device. The components of the electronic device 12 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and bus 18 connecting different system components (including system memory 28 and processing unit 16).

[0052] Bus 18 represents one or more of several bus architectures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of the various bus architectures. For example, these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect (PCI) bus.

[0053] Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including volatile and non-volatile media, removable and non-removable media.

[0054] System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and / or cache memory 32. Electronic device 12 may further include other removable / non-removable, volatile / non-volatile computer system storage media. By way of example only, storage system 34 may be used to read and write non-removable, non-volatile magnetic media (… Figure 4 Not shown; usually referred to as a "hard drive"). Although Figure 4 Not shown, a disk drive for reading and writing to a removable non-volatile disk (e.g., a "floppy disk") and an optical disk drive for reading and writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of the embodiments of the present invention.

[0055] A program / utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28. Such program modules 42 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data. Each or some combination of these examples may include an implementation of a network environment. Program modules 42 typically perform the functions and / or methods described in the embodiments of the present invention.

[0056] Electronic device 12 can also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), and with one or more devices that enable a user to interact with the electronic device 12 / server / computer, and / or with any device that enables the electronic device 12 to communicate with one or more other computing devices (e.g., network card, modem, etc.). This communication can be performed through input / output (I / O) interface 22. Furthermore, electronic device 12 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 20. Figure 4 As shown, network adapter 20 communicates with other modules of electronic device 12 via bus 18. It should be understood that, although... Figure 4 As not shown, other hardware and / or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0057] The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, such as implementing the non-uniform memory access data multiple copy method based on jump tags provided in the embodiments of the present invention.

[0058] Example 5 Embodiment 5 of the present invention also provides a storage medium containing computer-executable instructions, which, when executed by a computer processor, are used to perform the non-uniform memory access data multiple copy method based on jump tags provided in the above embodiments.

[0059] The computer storage medium of this invention can be any combination of one or more computer-readable media. A computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

[0060] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, capable of sending, propagating, or transmitting programs for use by or in connection with an instruction execution system, apparatus, or device.

[0061] Program code contained on a computer-readable medium may be transmitted using any suitable medium, including—but not limited to—wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0062] Computer program code for performing the operations of this invention can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages ​​such as "C" or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0063] Note that the above description is merely a preferred embodiment of the present invention and the technical principles employed. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and may include many other equivalent embodiments without departing from the concept of the present invention, the scope of which is determined by the scope of the appended claims.

Claims

1. A method for multiple copies of non-uniform memory access data based on jump tags, characterized in that, include: S101 inserts static probe points based on jump tags into the Linux kernel; S102: The background monitoring thread continuously collects memory page access data of remote NUMA nodes and uses a preset multi-dimensional hot spot scoring algorithm to calculate hot spot scores. When the hot spot score exceeds the preset hot spot threshold, the corresponding memory page is identified as a hot memory page and a local page replica creation operation is initiated. S103, in response to the local creation operation of the page copy, the jump tag corresponding to the hot memory page is enabled through an atomic operation, so that the static detection point based on the jump tag is switched from the disabled state to the enabled state, and the jump tag key value corresponding to the hot page is dynamically allocated. S104, when a process accesses the hot memory page again, the static probe point based on the jump tag is triggered. At this time, the preset local copy creation operation is performed using the jump tag to create a local copy of the hot memory page on the target NUMA node and update the page table mapping relationship of the process.

2. The method according to claim 1, characterized in that, The method further includes: The access status and lifecycle events of the local copy of the hot memory page are monitored in real time. When the preset recycling conditions are met, the jump tag key value corresponding to the hot memory page is disabled, so that the static detection point based on the jump tag is switched from enabled to disabled, and the resources associated with the jump tag key value are released.

3. The method according to claim 2, characterized in that, The preset recycling conditions include: When the local copy is not in a fixed state, or the page reference count of the local copy is 1, or the local copy has expired, or the number of local copies of the same hot memory page exceeds the preset upper limit threshold of the number of the same page, or the local copy is not in the node where the current process is located, the local copy of the hot memory page is recycled. By setting a recycling timer, the local replicas are periodically checked to see if they meet the preset recycling conditions. When the node memory pressure is too high, the local replicas are recycled according to the node's own recycling strategy. When creating a new local replica, if the number of local replicas exceeds the preset maximum number threshold, the old local replicas are actively recycled.

4. The method according to claim 1, characterized in that, S101 includes: The static probe points based on jump tags are inserted at the page fault handling function and task scheduling migration function in the Linux kernel, wherein the jump tags include jump tag key values; The static probe point based on the jump label is initially disabled, and will not trigger a replica creation operation when the code executes the default path.

5. The method according to claim 1, characterized in that, The preset multi-dimensional hotspot scoring algorithm includes: The remote access ratio score is calculated based on the proportion of the number of times the memory page is accessed by a remote NUMA node to the total number of accesses. Based on the historical access data of the memory pages, a piecewise linear decay model is used to calculate the access frequency score; Calculate the access intensity score based on the access time span of the preset recent access count of the memory page; The migration benefit score is calculated based on the difference in access latency between the memory page on the remote NUMA node and the local NUMA node, and the page network migration cost. The remote access ratio score, access frequency score, access intensity score, and migration benefit score are summed and compared with a preset hotspot scoring rule to determine the hotspot level of the memory page.

6. The method according to claim 1, characterized in that, The dynamically assigned jump tag key values ​​corresponding to the hot page include: First, maintain a global jump tag key-value pool, including the maximum number of key-values, the lowest usage frequency linked list, and the idle key-value linked list; When allocating jump tag key values ​​corresponding to hot pages, the maximum number of key values ​​used is determined based on whether the current number of key values ​​used has been reached. If not, the key values ​​are allocated from the idle key value list; if the maximum number has been reached, the key values ​​are allocated from the lowest usage frequency list. The page copy corresponding to the least used key value in the lowest usage frequency list is forcibly reclaimed and reused.

7. The method according to claim 1, characterized in that, The preset local copy creation operation includes: Allocate a new physical page frame to the target NUMA node; Initiate a non-blocking DMA copy operation to copy the hot memory pages of the remote NUMA node to a newly allocated physical page frame; Atomic update of the corresponding virtual address mapping in the process page table.

8. A non-uniform memory access data multiple copy device based on jump tags, used to implement the non-uniform memory access data multiple copy method based on jump tags as described in any one of claims 1-7, characterized in that, include: The probe configuration module is used to insert static probes based on jump tags into the Linux kernel; The hot page evaluation module is used to monitor the access data of memory pages of remote NUMA nodes and perform multi-dimensional hot page score calculation. When the hot page score exceeds the preset hot page threshold, the corresponding memory page is identified as a hot memory page and a local page copy creation operation is initiated. The jump tag configuration module is used to respond to the local creation operation of the page copy, enable the jump tag corresponding to the hot memory page through atomic operations, and dynamically allocate the jump tag key value corresponding to the hot page; The local copy creation module is used to perform a local copy creation operation by using a jump label when a process accesses a hot memory page again. It creates a local copy of the hot memory page on the target NUMA node and updates the page table mapping relationship of the process.

9. An electronic device, characterized in that, The electronic device includes: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors implement the non-uniform memory access data multiple copy method based on jump tags as described in any one of claims 1-7.

10. A storage medium containing computer-executable instructions, which, when executed by a computer processor, are used to perform the jump tag-based non-uniform memory access data multiple copy method as described in any one of claims 1-7.