Memory allocation method and apparatus, chip, device, and readable storage medium
By leveraging the collaborative work of the cluster management module and the agent module and utilizing topology and memory information to optimize memory borrowing, the problems of memory waste and business performance degradation have been solved, achieving reasonable allocation and efficient access to memory resources.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2025-09-03
- Publication Date
- 2026-06-18
Smart Images

Figure CN2025118701_18062026_PF_FP_ABST
Abstract
Description
Methods, devices, chips, equipment, and readable storage media for allocating memory
[0001] This application claims priority to Chinese Patent Application No. 202411813850.3, filed on December 10, 2024, entitled "Method, Apparatus, Chip, Device and Readable Storage Medium for Allocating Memory", the entire contents of which are incorporated herein by reference. Technical Field
[0002] This application relates to the field of storage technology, and in particular to methods, apparatus, chips, devices and readable storage media for allocating memory. Background Technology
[0003] With the continuous development of storage technology, memory sharing between different nodes has become possible. How to allocate memory from one node to another to achieve memory sharing between different nodes has become a problem worthy of attention. Summary of the Invention
[0004] This application provides a method, apparatus, chip, device, and readable storage medium for allocating memory to enable memory sharing between different nodes. The technical solution provided by this application includes the following aspects.
[0005] Firstly, a method for allocating memory is provided. This method is applied to a management module corresponding to a cluster, which includes multiple nodes. In this method, a first node identifier is obtained, indicating the first node among the multiple nodes. Topology information and memory information are obtained; the topology information indicates the connection relationships between at least some of the nodes, and the memory information indicates the memory usage of at least some of the nodes. A second node identifier is obtained based on the topology information and memory information, indicating the second node among the multiple nodes. Then, memory for the second node is allocated to the first node.
[0006] Since topology information indicates the connection relationship between different nodes and memory information indicates the memory usage of nodes, in the process of obtaining the second node identifier, that is, in the process of determining the second node to borrow memory from the first node, by considering the topology information and memory information, it is possible to ensure that the first node and the second node have a suitable connection relationship and the second node has a suitable memory usage. This makes the determination of the second node more reasonable and accurate, thereby ensuring the rationality and accuracy of memory allocation.
[0007] In one possible implementation, the memory information includes at least one of the following: memory allocation relationships between different nodes in at least some of the nodes, memory allocation amounts between different nodes, total memory of each of at least some of the nodes, used memory, unused memory, borrowed memory, or borrowed memory.
[0008] This implementation provides multiple types of memory information. When obtaining the second node identifier, one type of memory information can be considered alone, or multiple types of memory information can be considered together. This is more flexible and improves the rationality and accuracy of memory allocation.
[0009] In one possible implementation, the method further includes: obtaining latency information, which indicates the duration of the access process allowed by the first node, where the access process is the process by which the first node accesses allocated memory; and obtaining a second node identifier based on topology information and memory information, including: obtaining the second node identifier based on the latency information, topology information, and memory information, wherein the duration of the first node accessing the memory of the second node is less than the duration indicated by the latency information.
[0010] In addition to the topology and memory information mentioned above, latency information is also considered to ensure that the time consumed by the first node in accessing the memory of the second node is less than the time indicated by the latency information. This ensures that the actual time consumed by the access process is within the allowable range and avoids the access process from affecting the business performance of the first node.
[0011] In one possible implementation, obtaining the first node identifier includes: receiving a first allocation event, the first allocation event carrying a node identifier, the first allocation event being used to request memory allocation; and determining the node identifier carried by the first allocation event as the first node identifier.
[0012] The sender of the first allocation event triggers the memory allocation process by sending the first allocation event to the management module. This method is simple, fast, and highly practical. The sender of the first allocation event can be flexibly configured.
[0013] In one possible implementation, the first allocation event also carries a semantic identifier, which indicates the presentation semantics. The presentation semantics indicate the presentation method of the allocated memory. Allocating memory for the second node to the first node includes: sending a second allocation event to the second node, which indicates the allocation of memory; receiving a global address sent by the second node, which indicates the address of the memory of the second node; generating a third allocation event, which carries the global address and the semantic identifier, and is used to allocate memory for the second node and present the allocated memory according to the presentation method; and sending the third allocation event to the first node.
[0014] This semantic identifier enables the first node to present the allocated memory to the upper layer (such as the application running in the first node) according to the semantic identifier after it obtains the allocated memory, that is, after the first node borrows the memory. This is to present the borrowed memory, which is beneficial to meet the different needs of the upper layer and is more flexible.
[0015] In one possible implementation, the method also includes: canceling the allocation of memory for the second node to the first node.
[0016] In addition to allocating memory, memory allocation can also be cancelled according to actual needs, enabling reasonable memory scheduling between different nodes, which is quite flexible.
[0017] In one possible implementation, canceling the allocation of memory for the second node to the first node includes: sending a first cancellation event to the first node, the first cancellation event carrying a global address, the first cancellation event being used to indicate the release of the memory indicated by the global address; receiving a response from the first node, the response indicating that the memory indicated by the global address has been released; generating a second cancellation event, the second cancellation event carrying a global address, the second cancellation event being used to release the memory indicated by the global address; and sending a second cancellation event to the second node.
[0018] In this implementation, the management module releases the memory indicated by the global address only after confirming that the first node has released its occupation of the memory. This ensures the reliability of the memory release process and avoids affecting the business performance of the first node.
[0019] Secondly, a memory allocation apparatus is provided, comprising an acquisition module and an allocation module. The acquisition module is configured to acquire a first node identifier, which indicates a first node among a plurality of nodes; the acquisition module is further configured to acquire topology information and memory information, the topology information indicating the connection relationships between at least some of the nodes among the plurality of nodes, and the memory information indicating the memory usage of at least some of the nodes among the plurality of nodes; the acquisition module is further configured to acquire a second node identifier based on the topology information and memory information, the second node identifier indicating a second node among the plurality of nodes; and the allocation module is configured to allocate memory of the second node to the first node.
[0020] In one possible implementation, the memory information includes at least one of the following: memory allocation relationships between different nodes in at least some of the nodes, memory allocation amounts between different nodes, total memory of each of at least some of the nodes, used memory, unused memory, borrowed memory, or borrowed memory.
[0021] In one possible implementation, the acquisition module is further configured to acquire latency information, which indicates the duration allowed for the first node to access the allocated memory. The acquisition module is also configured to acquire the identifier of the second node based on the latency information, topology information, and memory information, wherein the duration consumed by the first node to access the second node's memory is less than the duration indicated by the latency information.
[0022] In one possible implementation, an acquisition module is used to receive a first allocation event, the first allocation event carrying a node identifier, the first allocation event being used to request memory allocation; and to determine the node identifier carried by the first allocation event as the first node identifier.
[0023] In one possible implementation, the first allocation event also carries a semantic identifier, which indicates the presentation semantics and the presentation semantics indicate the presentation method of the allocated memory. The allocation module is used to send a second allocation event to the second node, which indicates the allocation of memory; receive a global address sent by the second node, which indicates the address of the memory of the second node; generate a third allocation event, which carries a global address and a semantic identifier, and is used to allocate memory of the second node and present the allocated memory according to the presentation method; and send the third allocation event to the first node.
[0024] In one possible implementation, the allocation module is also used to cancel the allocation of memory for the second node to the first node.
[0025] In one possible implementation, the allocation module is configured to send a first cancellation event to a first node, the first cancellation event carrying a global address, the first cancellation event being used to indicate the release of the memory indicated by the global address; receive a response from the first node, the response indicating that the memory indicated by the global address has been released; generate a second cancellation event, the second cancellation event carrying a global address, the second cancellation event being used to release the memory indicated by the global address; and send a second cancellation event to a second node.
[0026] Thirdly, a chip for allocating memory is provided, the chip including a processor for retrieving and executing instructions stored in memory from memory, causing a computer equipped with the chip to execute the method for allocating memory provided in the first aspect or any possible implementation thereof.
[0027] Fourthly, another memory allocation chip is provided, which includes an input interface, an output interface, a processor, and a memory. The input interface, output interface, processor, and memory are connected through an internal connection path. The processor is used to execute code in the memory. When the code is executed, the computer with the chip installed executes the memory allocation method provided in the first aspect or any possible implementation of the first aspect.
[0028] Fifthly, a device for allocating memory is provided, the device including a processor coupled to a memory having at least one instruction stored in the memory, the at least one instruction being loaded and executed by the processor to cause the device for allocating memory to perform the method for allocating memory provided in the first aspect or any possible implementation thereof.
[0029] Optionally, there may be one or more processors and one or more memories.
[0030] Alternatively, the memory can be integrated with the processor, or the memory can be set up separately from the processor.
[0031] In a sixth aspect, a computer-readable storage medium is provided, wherein at least one instruction is stored in the computer-readable storage medium, the instruction being loaded and executed by a processor to implement the method of allocating memory provided by the first aspect or any possible implementation thereof.
[0032] In a seventh aspect, a computer program or computer program product is provided, comprising computer programs or instructions that are executed by a processor to enable a computer to implement the method for allocating memory provided in the first aspect or any possible implementation thereof.
[0033] It should be understood that the technical effects achieved by the technical solutions and corresponding possible implementations of the second to seventh aspects of this application can be referred to the above-described technical effects of the first aspect and corresponding possible implementations, and will not be repeated here. Attached Figure Description
[0034] Figure 1 is a schematic diagram of a related technology provided in an embodiment of this application;
[0035] Figure 2 is a schematic diagram of another related technology provided in an embodiment of this application;
[0036] Figure 3 is a schematic diagram of an implementation environment provided in an embodiment of this application;
[0037] Figure 4 is a flowchart of a memory allocation method provided in an embodiment of this application;
[0038] Figure 5 is a flowchart of obtaining memory information according to an embodiment of this application;
[0039] Figure 6 is a flowchart of obtaining topology information according to an embodiment of this application;
[0040] Figure 7 is a flowchart of a memory borrowing method provided in an embodiment of this application;
[0041] Figure 8 is a flowchart of a memory access method provided in an embodiment of this application;
[0042] Figure 9 is a schematic diagram illustrating non-uniform memory access (NUMA) semantics provided in an embodiment of this application;
[0043] Figure 10 is a schematic diagram illustrating the semantics of a device according to an embodiment of this application;
[0044] Figure 11 is a schematic diagram illustrating the presentation of document semantics according to an embodiment of this application;
[0045] Figure 12 is a flowchart of a memory return process provided in an embodiment of this application;
[0046] Figure 13 is a schematic diagram of a memory allocation device provided in an embodiment of this application;
[0047] Figure 14 is a schematic diagram of the structure of a memory allocation device provided in an embodiment of this application. Detailed Implementation
[0048] The terminology used in the implementation section of this application is for the purpose of explaining specific embodiments of this application only, and is not intended to limit this application.
[0049] As business data volumes continue to increase, the demands on the computing power and memory capacity of nodes (physical machines, such as servers) are also constantly growing. Memory costs may account for more than 50% of the total cost of a node. In various scenarios, including cloud, internet, virtualization, big data, and database environments, low cost and high cost-effectiveness are required. Therefore, at least the following issues exist regarding memory.
[0050] On the one hand, the fluctuations in memory usage lead to memory waste. Nodes exhibit peaks and troughs in memory usage, requiring larger memory capacities at peaks and smaller ones at troughs. While configuring memory based on the larger memory capacity required at peaks may help meet service level agreements (SLAs), it results in memory waste.
[0051] On the other hand, the existence of the input / output (I / O) wall leads to a decline in business performance. Specifically, when the amount of business data corresponding to a node is large, the memory capacity of a single node is limited and cannot store all the data. External storage (i.e., storage outside of memory, such as disks) is needed to store the data, and accessing external storage is slow. This phenomenon is called the I / O wall, and its existence leads to a decline in business performance. When the scale of business data grows exponentially, the memory capacity may be less than an order of magnitude of the business data volume, requiring the use of more external storage, which may seriously affect business performance. For example, if the memory is on the order of trillions of bytes (TB), while the business data volume is on the order of petabytes (PB), 1 PB = 1024 TB, then the memory cannot store all the data, which may cause some data to overflow from memory to disk (i.e., be written to disk), resulting in a severe decline in business performance.
[0052] Memory borrowing technology can improve upon the two issues mentioned above. Memory borrowing refers to the process of borrowing memory from other nodes when the node's memory capacity is insufficient. Based on memory borrowing technology, it is not necessary to configure memory according to the larger memory capacity required at peak times; instead, memory can be configured according to the average memory capacity required at peak and trough times, thus avoiding memory waste. Furthermore, when a node's memory capacity is insufficient, that node can borrow memory to prevent data from overflowing from memory to disk, thereby preventing a decline in business performance.
[0053] Referring to Figure 1, the memory borrowing technology provided by Related Technology 1 is implemented through memory tiering. This means that each node includes a high-speed Level 1 cache and a low-speed Level 2 cache. The Level 2 cache is, for example, a persistent storage medium such as a solid-state disk (SSD). Furthermore, data is divided into hot data and cold data. Hot data stored in the Level 1 cache is retained there, while cold data stored in the Level 1 cache is swapped out to the Level 2 cache. If cold data is accessed again by the application, it is swapped back into the Level 1 cache. This combination of Level 1 and Level 2 caches expands the available memory capacity. However, in Related Technology 1, when cold data is accessed, it is necessary to first swap the cold data stored in the Level 2 cache into the Level 1 cache before accessing the cold data based on the Level 1 cache. Therefore, access to cold data is slow, resulting in poor business performance.
[0054] Referring to Figure 2, the memory borrowing technology provided by Related Technology 2 offers a centralized memory pool deployed on an independent memory node connected to the service nodes. When a service node's memory capacity is insufficient, it can borrow memory from the memory node. However, Related Technology 2 requires the introduction of an independent memory node and the connection between the memory node and the service nodes, resulting in high costs. Furthermore, if a memory node fails, the failure will propagate to all service nodes borrowing memory from that memory node, leading to poor reliability.
[0055] To address this issue, this application provides a method for allocating memory to improve or overcome the problems existing in related technologies. Before describing the method provided in this application, the implementation environment of this application will be described to facilitate understanding.
[0056] As shown in Figure 3, the cluster includes multiple nodes, namely nodes 1 to N, where N is a positive integer greater than 1. These nodes are interconnected, forming a specific topology. Optionally, the cluster can be a computing cluster, where each node is deployed in a rack and connected via a bus. The computing cluster can provide various computing services, such as general computing services and artificial intelligence (AI) computing services. Agent modules are deployed on all or some of the nodes, with at least two nodes having agent modules deployed. The agent modules can be, for example, in software form; this embodiment does not limit the specific form of the agent modules.
[0057] A node includes a processor and memory. The processor is, for example, at least one of a central processing unit (CPU) or a graph processing unit (GPU). When a node includes a CPU but not a GPU, the node is a non-heterogeneous node; when a node includes both a CPU and a GPU, the node is a heterogeneous node. The memory allocation process in this embodiment can occur between heterogeneous nodes. For example, memory corresponding to the CPU of another heterogeneous node can be allocated to the GPU of one heterogeneous node. Of course, the memory allocation process can also occur between non-heterogeneous nodes, or between non-heterogeneous nodes and heterogeneous nodes; this embodiment does not limit this.
[0058] The memory contained in multiple nodes can form a memory pool, which is a distributed memory pool. For example, for multiple nodes connected by a high-speed interconnect bus (an example of a bus), each node contributes a certain proportion (the value of which is not limited) of its total memory as part of the memory pool, thus forming a memory pool. Referring to Figure 3, the memory pool can include memory 2 in node 1 and memory 4 in node N. The high-speed interconnect bus has characteristics such as high bandwidth, low latency, and cross-node memory semantics, such as load / store semantics. Compared to send / receive commands in network communication technology or read / write commands in remote direct memory access (RDMA) technology, the load / store semantics provides load / store commands that do not require link establishment (i.e., connection creation) or page copying between different nodes, and simplifies the message structure used in cross-node memory access (e.g., shortening the message header length), thereby enabling fast cross-node memory access. Optionally, the high-speed interconnect bus includes, but is not limited to, the compute express link (CXL) bus.
[0059] Additionally, the portion of the node's total memory that does not form a memory pool can be used to store operating system (OS) information and application information. The processor can read OS information from memory and run the OS within the node based on that information. The processor can also read application information from memory and run applications within the running OS based on that information.
[0060] The proxy module is used to communicate with the management module corresponding to the cluster. For example, the proxy module provides an interface that the application described above can call to communicate with the management module through the proxy module. Through the communication between the proxy module and the management module, the management module can manage the memory pool. Thus, the proxy module and the management module constitute the control plane of the memory allocation system, and the memory pool constitutes the data plane of the memory allocation system. For example, when a node in the cluster (denoted as node A) needs to borrow memory, another suitable node (denoted as node B) is selected to borrow memory from node A, that is, memory from node B is allocated to node A. Node A is the memory borrower, and node B is the memory lender. A certain proportion of the total memory included in node B is located in this memory pool. The management module may be in software form, but this application embodiment does not limit this. The management module can be deployed on the nodes included in the cluster or on nodes outside the cluster. Other nodes include, but are not limited to, switches, compute nodes, or management nodes.
[0061] Referring again to Figure 3, the management module includes an information acquisition and management unit and a memory allocation decision unit. The information acquisition and management unit interacts with the agent module to collect and manage (e.g., update) various information such as memory information and topology information. Memory information indicates the memory usage of at least some of the nodes, and topology information indicates the connection relationships between the nodes. The information acquisition and management unit also interacts with the memory allocation decision unit to report the collected and managed information. The memory allocation decision unit determines how to allocate memory based on this information, achieving memory allocation through interaction with the agent module. For example, if node A is the memory borrower and node B is the memory lender, as described above, the memory allocation decision unit interacts with the agent modules deployed on node A and node B respectively to allocate memory from node B to node A. Optionally, node A can present the borrowed memory to the upper layer (e.g., the OS or application) according to certain semantics, where semantics include, but are not limited to, NUMA semantics, device semantics, and file semantics. The method embodiments below will further describe the memory allocation process in detail, which will not be repeated here.
[0062] This application provides a method for allocating memory, which can be applied to a management module corresponding to a cluster, where the cluster includes multiple nodes. As shown in Figure 4, the method includes the following steps 401 to 404.
[0063] Step 401: Obtain the first node identifier, which indicates the first node among multiple nodes.
[0064] The number of first nodes can be one or more; the following explanation uses a single first node as an example. A first node is a node that needs to borrow memory. The reason a first node needs to borrow memory is, for example, insufficient local memory capacity. For instance, node memory usage fluctuates, with peaks and troughs. During peak periods, due to the need for larger memory capacities, the first node's local memory capacity may be insufficient. If the first node does not borrow memory when its local memory capacity is insufficient, it may need to transfer data stored in its local memory to its local disk. Since disk access speed is slower than memory access speed, when the first node needs to access the disk to access data during service provision, it may impact service performance. Therefore, borrowing memory when the first node's local memory capacity is insufficient can avoid impacting service performance. Of course, insufficient memory here is just an example; the first node may need to borrow memory for other reasons as well.
[0065] In an exemplary embodiment, the management module obtains the first node identifier by: receiving a first allocation event, the first allocation event carrying a node identifier, the first allocation event being used to request memory allocation; and determining the node identifier carried in the first allocation event as the first node identifier. When the management module and the sender of the first allocation event are deployed on the same node, the management module can receive the first allocation event via the bus inside that node. Alternatively, when the management module and the sender of the first allocation event are deployed on two different nodes, the management module receives the first allocation event via a bus between the two nodes (e.g., a high-speed interconnect bus).
[0066] For example, the sender of the first allocation event includes, but is not limited to, the following:
[0067] The first type of sender is the OS deployed on the first node. The OS can obtain the node metrics of the first node, and generate a first allocation event when the node metrics are greater than or equal to the metric threshold, and send the first allocation event to the management module. For example, the node metrics include the memory utilization rate of the first node. A node metric greater than or equal to the metric threshold means that the memory utilization rate of the first node is greater than or equal to the memory utilization rate threshold. This situation can also be called out of memory (OOM). The embodiments of this application do not limit the value of the memory utilization rate threshold.
[0068] The second type of sender is the application deployed on the first node, which can run on the OS. For example, if the application needs to consume a lot of memory during its operation, it can generate a first allocation event after starting to run and send the first allocation event to the management module.
[0069] The third type of sender is the management node of the first node. This management node can be used by the administrator to manage the first node. The administrator can monitor the first node through the management node. If, based on the monitored information (such as the node's metrics), it determines that the first node needs to borrow memory, then it will request memory for the first node. For example, when the administrator performs a memory borrowing operation, the management node, in response to detecting the memory borrowing operation, generates a first allocation event and sends the first allocation event to the management module.
[0070] The fourth type of sender is the cloud management platform of the first node. This cloud management platform is used to automatically manage the first node and can be deployed on the management node of the first node. The management node with the cloud management platform deployed can be the same as or different from the management node mentioned in the third type of sender description; this application embodiment does not limit this. The cloud management platform can automatically monitor the first node. If, based on the monitored conditions (e.g., node metrics of the first node), it determines that the first node needs to borrow memory, it generates a first allocation event and sends the first allocation event to the management module.
[0071] Regardless of the sender from which the management module receives the first allocation event, it can parse the first allocation event to obtain the node identifier carried by the first allocation event, and determine the node identifier as the first node identifier. In other words, the node indicated by the node identifier is determined as the first node that needs to borrow memory, and the first node is the memory borrower. For example, in addition to carrying the node identifier, the first allocation event can also carry other types of identifiers, which will be explained below.
[0072] The first type of identifier is the event identifier, which indicates that memory has been borrowed.
[0073] Therefore, the management module parses the first allocation event and obtains the event identifier. Based on the event identifier, it determines that the first allocation event is an event used to borrow memory, thereby triggering the allocation of memory to the first node to realize the memory borrowing of the first node. The process of the management module allocating memory to the first node is detailed in step 404 below, and will not be elaborated here.
[0074] In this embodiment, using an event identifier to indicate memory borrowing is merely an example. The management module can also provide an interface where the sender of the first allocation event calls the interface to send the first allocation event. After receiving the first allocation event through the interface, the management module assumes that the first allocation event is an event for borrowing memory. Thus, even if the first allocation event does not carry an event identifier, the management module can still determine that the first allocation event is an event for borrowing memory, thereby triggering the allocation of memory for the first node.
[0075] The following text also mentions various other events such as the second allocation event and the third allocation event. Optionally, each event can carry an event identifier to indicate the function of the event. The event identifiers carried in different events can have different values. Different values can be obtained through configuration or negotiation. This application embodiment does not limit this, and the event identifiers will not be described in detail below.
[0076] The second type of identifier is the memory size identifier. The memory size identifier indicates the amount of memory that the first node needs to borrow.
[0077] Therefore, by parsing the first allocation event, the management module can also obtain a memory size identifier. Based on the memory size identifier, it determines the amount of memory that the first node needs to borrow, so as to determine the second node to borrow memory from the first node. The process of the management module determining the second node is detailed in step 403 below and will not be elaborated here.
[0078] In this embodiment, using a memory size identifier to indicate the amount of memory the first node needs to borrow is merely an example. The management module can also default to a fixed size for the memory the first node needs to borrow, which can be configured or negotiated. For instance, taking a fixed size of 2 gigabytes (GB) as an example, each time the management module receives a first allocation event, it defaults to the first node needing to borrow 2GB of memory. If the first node needs to borrow more than 2GB of memory, it can send multiple first allocation events. Thus, even if the first allocation event does not carry a memory size identifier, the management module can still determine the amount of memory the first node needs to borrow.
[0079] The third type of identifier is the semantic identifier. The semantic identifier indicates the semantic representation, which in turn indicates how the allocated memory is represented.
[0080] Therefore, the management module can also obtain a semantic identifier by parsing the first allocation event. The presentation semantic indicated by this semantic identifier represents how the first node should present the borrowed memory to the application after borrowing it. Optionally, the presentation semantic includes, but is not limited to, NUMA semantic, device semantic, and file semantic. The process of the first node presenting the borrowed memory according to the presentation semantic is detailed in step 404 below, and will not be elaborated here.
[0081] Optionally, if the sender of the first allocation event is not deployed on the first node, including but not limited to the third or fourth type of sender mentioned above, the first allocation event also carries a semantic identifier. Therefore, the management module can send this semantic identifier to the first node during the subsequent memory allocation process, so that the first node can present the borrowed memory to the application according to the presentation semantics indicated by the semantic identifier.
[0082] In this embodiment, the use of a semantic identifier in the first allocation event is merely an example. For instance, the first node may also present borrowed memory to the application by default according to a certain presentation semantic, which can be obtained through the configuration of the first node. Thus, even if the first allocation event does not carry a semantic identifier, the first node can still present borrowed memory.
[0083] In addition, when the sender of the first allocation event is deployed on the first node, including but not limited to the first type of sender or the second type of sender mentioned above, the first allocation event may or may not carry a semantic identifier, and this application embodiment does not limit this.
[0084] The fourth type of identifier is the permission identifier. The permission identifier indicates the permissions granted for borrowing memory.
[0085] Therefore, the management module can also obtain a permission identifier by parsing the first allocation event. The permission identifier indicates that if the first node is subsequently able to borrow memory, then the first node has at least one of the permissions: read permission or write permission, to the borrowed memory. The process of the first node accessing the borrowed memory according to its permissions is detailed in step 404 below, and will not be elaborated here.
[0086] Optionally, if the sender of the first allocation event is not deployed on the first node, the first allocation event also carries a permission identifier. Therefore, the management module can send this permission identifier to the first node during subsequent memory allocation, allowing the first node to access the borrowed memory according to the permissions indicated by the permission identifier. Alternatively, the first node can also access the borrowed memory with certain permissions by default, which can be obtained through configuration of the first node. Thus, even if the first allocation event does not carry a permission identifier, the first node can still determine permissions.
[0087] Furthermore, when the sender of the first allocation event is deployed on the first node, the first allocation event may or may not carry an authorization identifier, and this application embodiment does not limit this.
[0088] The preceding text described the scenario where the management module obtains the first node identifier upon receiving the first allocation event. Alternatively, the management module can determine the first node identifier itself. Optionally, the management module obtains node metrics for each of the multiple nodes, determines the node that needs to borrow memory based on these metrics, and uses the node identifier indicating that node as the first node identifier. For example, the memory of the first node may consist of multiple memory pages, and the node metrics may include the number of non-free memory pages (which are memory pages already allocated to upper-layer applications or the OS). The management module obtains the number of non-free memory pages for each of the multiple nodes. If the number of non-free memory pages is greater than or equal to the watermark, it indicates that the first node needs to borrow memory, and thus the management module can obtain the first node identifier. Exemplarily, the management module can also obtain any one or more other identifiers mentioned above based on the node metrics.
[0089] Therefore, the embodiments of this application can trigger the management module to obtain the first node identifier and initiate the memory allocation process through various means such as OS, application, management node, cloud management platform, or waterline. Optionally, regardless of the method by which the management module obtains the first node identifier, the management module can perform permission authentication after obtaining the first node identifier to determine whether the first node indicated by the first node identifier has the permission to borrow memory. If the first node has the permission, memory can be allocated to the first node subsequently; if the first node does not have the permission, no memory needs to be allocated to the first node, thus ensuring the security and reliability of the memory allocation process.
[0090] Step 402: Obtain topology information and memory information. The topology information indicates the connection relationship between at least some of the nodes among the multiple nodes, and the memory information indicates the memory usage of at least some of the nodes among the multiple nodes.
[0091] The memory information and topology information will be explained below.
[0092] Memory information indicates the memory usage of at least some nodes among multiple nodes, such as the memory usage of all or some nodes. By considering memory usage, the node most suitable for lending memory can be identified as the second node, ensuring that the memory lent by the second node matches its memory usage, thus preventing the second node from affecting its local memory usage due to memory lending.
[0093] Optionally, as shown in Figure 5, agent modules deployed on multiple nodes periodically obtain the memory information of the nodes and report the memory information to the management module. The management module receives and stores (either persistent or non-persistent storage) the memory information reported by each agent module, thereby realizing real-time updates of memory information.
[0094] In an exemplary embodiment, the memory information includes at least one of the following.
[0095] The memory allocation relationship between different nodes is also known as a borrowing and lending relationship. Taking multiple nodes, including node A, node B, and node C, as an example, the memory allocation relationship can be that node A borrows memory from node B, node B borrows memory from node C, and node C borrows memory from node A. In this embodiment, the node that has a borrowing and lending relationship with the first node can be preferentially identified as the second node.
[0096] The memory allocation between different nodes refers to the amount of memory borrowed and the amount of memory lent out. Taking multiple nodes, including node A, node B, and node C, as an example, the memory allocation can be the amount of memory borrowed by node A from node B, the amount of memory borrowed by node B from node C, and the amount of memory borrowed by node C from node A. In this embodiment, the node with the smaller memory allocation can be preferentially designated as the second node.
[0097] Optionally, by considering the memory allocation relationships and amounts among different nodes, the management module can determine a second node to achieve balanced memory borrowing. For example, taking multiple nodes including node A, node B, and node C as an example, if node B has already borrowed memory from node A, and node A needs to borrow memory again later, node C can be determined as the second node, so that node B and node C borrow memory from node A in a balanced manner.
[0098] Regarding memory allocation relationships and memory allocation amounts, since the management module manages the memory allocation process between different nodes, it can detect changes in these relationships and amounts. Therefore, the management module updates these relationships and amounts in real time when changes occur to ensure real-time performance and accuracy. For example, when a node borrows memory or cancels a memory loan (i.e., returns memory), the management module can detect these changes and update them accordingly.
[0099] The total memory of each of the multiple nodes refers to the total amount of memory stored locally on that node. In this embodiment, the node with the larger total memory can be designated as the second node.
[0100] The used memory of multiple nodes refers to the memory within the node's local memory that is used by the node itself. This used memory does not include the node's borrowed memory. In this embodiment, the node with the smaller used memory can be designated as the second node.
[0101] The borrowed memory of each of the multiple nodes refers to the amount of memory that the node has borrowed from other nodes, which are the nodes other than the node itself. If a node has borrowed memory from multiple other nodes, then the node's borrowed memory can be the sum of the amounts of memory borrowed from each of those other nodes. In this embodiment, the node with the smaller borrowed memory can be designated as the second node.
[0102] The borrowed memory of multiple nodes refers to the amount of memory that node has borrowed from other nodes. If a node has borrowed memory from multiple other nodes, then the node's borrowed memory can be the sum of the amounts of memory borrowed from each of those other nodes. In this embodiment, the node with the smaller borrowed memory can be designated as the second node.
[0103] The unused memory of multiple nodes refers to the unused memory in the node's local memory. The sum of a node's used memory, borrowed memory, and unused memory equals the node's total memory. In this embodiment, the node with the larger amount of unused memory can be preferentially identified as the second node.
[0104] When memory information includes multiple pieces of information, for nodes other than the first node (i.e., nodes that could potentially become the second node), the management module can determine the score corresponding to each type of information for that node, obtaining multiple scores. The weighted average of these scores is taken as the score for that node, and the second node is determined based on the scores of each node. For example, if memory information includes the total memory and unused memory of each node, a node with a larger total memory will have a higher score A, and a node with a larger unused memory will have a higher score B. The weighted average of scores A and C is recorded as score C, and the second node is determined from the multiple nodes based on score C.
[0105] Topology information indicates the connections between at least some of the nodes in a set of nodes, such as the connections between all or some of the nodes. By considering these connections, it is possible to determine which nodes are connected to the first node from among the multiple nodes, ensuring that the first node and the second node are connected nodes in the set, so that the second node can borrow memory from the first node.
[0106] Optionally, as shown in Figure 6, the management module statically configures topology information. Based on this, the management module updates the topology information through interaction with the agent module. For example, when a node is added or removed from the cluster, or when communication connections between different nodes are added or broken, the topology formed by the nodes in the cluster is updated, thus requiring an update of the topology information. In one example, the management module actively sends a query event to the agent module to check whether the topology corresponding to the node where the agent module is located has been updated (equivalent to a handshake). The agent module reports the topology update status back to the management module based on the query event, causing the management module to update the topology information. In another example, the agent module can actively send an update event to the management module when it detects a topology update, to report the topology update status and cause the management module to update the topology information. In yet another example (not shown in Figure 6), due to the topology update, the node where the agent module is located may experience a communication failure. For example, if the communication connection between the node where the agent module is located and other nodes is broken, the node where the agent module is located experiences a communication failure, and the agent module sends a failure event to the management module to report the communication failure. The management module sends a query event to the agent module based on the communication failure. The query event is detailed above and will not be repeated here.
[0107] Step 403: Obtain the second node identifier based on the topology information and memory information. The second node identifier indicates the second node among multiple nodes.
[0108] In this embodiment, since the second node identifier is obtained based on topology information and memory information, and considering the explanation of topology information and memory information in step 402 above, it can be seen that the second node is connected to the first node, and the acquisition of the second node identifier is related to the memory information. The number of second nodes can be one or more; this embodiment does not limit the number of second nodes. The second node is a node that lends out memory. The connection between the second node and the first node ensures that after memory is allocated to the first node, the first node can normally access the memory of the second node. For example, the second node and the first node are connected via a high-speed interconnect bus, and the first node directly accesses the memory of the second node through the load / store semantics. The acquisition of the second node identifier is related to memory information, and the memory information indicates memory usage, ensuring that the management module can reasonably and accurately determine the second node based on the memory usage of at least some of the multiple nodes, thereby obtaining the second node identifier.
[0109] In an exemplary embodiment, the management module obtains the second node identifier based on topology information and memory information, including: the management module determines, based on the topology information, a node connected to the first node from a plurality of nodes; the management module determines, based on the memory information, a second node from the nodes connected to the first node, and the second node identifier is an identifier indicating the second node. Alternatively, the management module obtains the second node identifier based on topology information and memory information, including: the management module determines all or some nodes from a plurality of nodes based on the memory information; the management module further determines, based on the topology information, a node connected to the first node from the determined all or some nodes, thus obtaining the second node, and the second node identifier is an identifier indicating the second node. The method by which the management module determines the node based on the memory information can be seen in the example in step 402 above, and will not be repeated here.
[0110] For example, since the second node is the node that borrows memory, the amount of memory that the second node can borrow should be greater than or equal to the amount of memory that the first node needs to borrow. Referring to the explanation in step 401 above, the amount of memory that the first node needs to borrow can be a fixed size or the size indicated by the second identifier; this embodiment does not limit this. When there are multiple second nodes, the sum of the amounts of memory that the multiple second nodes can borrow is greater than or equal to the amount of memory that the first node needs to borrow. The amounts of memory that different second nodes can borrow can be the same or different.
[0111] Based on this, the management module can also obtain the amount of memory that the first node needs to borrow. Based on the amount of memory to be borrowed, topology information, and memory information, it can determine the second node from the nodes connected to the first node. This ensures that the second node is connected to the first node, matches the memory usage of the second node, and ensures that the amount of memory that the second node can borrow is greater than or equal to the amount of memory that the first node needs to borrow.
[0112] In an exemplary embodiment, the second node identifier is also associated with latency information, which indicates the duration allowed for the first node to access the allocated memory (i.e., the borrowed memory). Alternatively, the latency information represents the allowed duration for the first node to access the borrowed memory if it is subsequently able to borrow more. Optionally, the latency information can be a single moment, a time range, or a latency level, with each latency level corresponding to a moment or time range. This embodiment does not limit the form of the latency information.
[0113] In other words, the management module can also obtain latency information. Accordingly, it obtains the second node identifier based on topology and memory information, including: obtaining the second node identifier based on latency, topology, and memory information (and possibly the size of the memory the first node needs to borrow), so that the time consumed by the first node to access the memory of the second node is less than the time indicated by the latency information, thereby ensuring that the time consumed by the first node to access the memory of the second node is within the allowable range, and thus avoiding impact on the business performance of the first node. The time consumed by the first node to access the memory of the second node is positively correlated with the distance between the first and second nodes. The distance between the first and second nodes can be physical distance; the greater the physical distance, the greater the access time. The distance between the first and second nodes can also be the number of hops; the greater the number of hops, the greater the access time. Based on this, in embodiments of this application, when the allowable time of the latency information is short, a node that is physically closer to the first node or has a smaller number of hops can be selected as the second node.
[0114] Step 404: Allocate memory for the second node to the first node.
[0115] Since the first node is the memory borrower and the second node is the memory lender, the management module allocates memory from the second node to the first node so that the first node can borrow memory from the second node. After borrowing memory from the second node, the first node can read data from the memory in the second node using the load instruction provided by the load semantics, or write data to the memory in the second node using the store instruction provided by the store semantics.
[0116] In an exemplary embodiment, allocating memory for the second node to the first node includes: sending a second allocation event to the second node, the second allocation event indicating memory allocation; receiving a global address sent by the second node, the global address indicating the address of the second node's memory; generating a third allocation event, the third allocation event carrying the global address, the third allocation event being used to allocate memory for the second node; and sending the third allocation event to the first node. The global address is unique within the cluster, and each node in the cluster and the corresponding management module can identify the global address. The global address can be a virtual address.
[0117] As shown in Figure 7, the management module receives a first allocation event and determines a second node to borrow memory based on topology information, memory information, and latency information. The management module then sends a second allocation event to the agent module of the second node. This second allocation event can carry the size of the memory that the second node needs to borrow, or it can be used for the second node to borrow a fixed amount of memory. The agent module of the second node receives the second allocation event, determines the locally borrowed memory based on it, establishes a second correspondence between the physical address of the borrowed memory on the second node and its global address, forming a map, and returns the global address to the management module. The management module receives the global address and generates a third allocation event based on it. For example, the management module generates a third allocation event based on the global address, permission identifier, and semantic identifier, and sends the third allocation event to the agent module of the first node. The agent module of the first node receives the third allocation event and establishes a first correspondence between the local physical address of the first node and its global address. Optionally, the management module can update the above memory allocation relationship and memory allocation amount after sending the third allocation event to the agent module of the first node.
[0118] The local physical address of the first node is the physical address that the first node can allocate, but the memory indicated by this local physical address does not actually exist locally on the first node. For example, if the first node has 16 memory interfaces and can accommodate 16 1TB memory modules, then the physical addresses that the first node can allocate include the physical addresses corresponding to 16TB. However, if the first node actually only installs 10 1TB memory modules, then only the physical addresses corresponding to these 10TBs actually exist locally on the first node, while the physical addresses corresponding to the other 6TBs do not actually exist locally on the first node. Therefore, the first node can choose a physical address from the physical addresses corresponding to the other 6TBs as the local physical address of the first node in the first mapping.
[0119] As shown in Figure 8, after the first node borrows memory, it can use that memory, for example, by accessing the memory of the second node through the load / store instructions provided by the load / store semantics. For instance, when the application of the first node indicates access to a certain virtual address, the first node translates the virtual address into a physical address local to the first node (for example, through a page table, which is a data structure used to record the mapping relationship between two types of data). Since the memory indicated by the physical address local to the first node does not actually exist locally on the first node, the first node queries the first correspondence mentioned above based on the physical address local to the first node to obtain the global address, and sends a request (i.e., a load / store instruction) to the second node where the memory indicated by the global address is located through the high-speed interconnect bus. If the request is for writing data, the first node also sends the data to be written to the second node through the high-speed interconnect bus.
[0120] After receiving a request, the second node queries the second mapping relationship mentioned above based on the global address to obtain its local physical address. The second node can access the memory indicated by its local physical address. If the request is for writing data, the second node writes the data to the memory indicated by its local physical address. If the request is for reading data, the second node reads the data from the memory indicated by its local physical address and returns the read data to the first node via the high-speed interconnect bus. Optionally, the first node can selectively cache this data in its local memory; this is not limited here.
[0121] Optionally, when an application on the first node indicates access to a certain virtual address, the first node can determine whether the access satisfies the permissions indicated by the permission identifier. If so, virtual address translation is then performed to send a request to the second node. For example, if the permission identifier indicates that the first node has read permissions but not write permissions for the borrowed memory, then the first node will perform virtual address translation only if the application on the first node requests to read the virtual address; however, if the application on the first node requests to write the virtual address, no virtual address translation will be performed because the first node lacks the necessary permissions.
[0122] In an exemplary embodiment, where the first allocation event also carries a semantic identifier, the third allocation event carries a semantic identifier indicating presentation semantics. The third allocation event is used to allocate memory for the second node and present the allocated memory (i.e., the borrowed memory) according to the presentation semantics. Therefore, after the management module sends the third allocation event to the agent module of the first node, the agent module of the first node can present the borrowed memory according to the presentation semantics.
[0123] Optionally, semantics can include, but are not limited to, NUMA semantics, device semantics, and file semantics. For example, in a virtualization scenario, when a virtual machine runs out of memory after over-division, it needs to borrow memory and employ NUMA semantics (which can be combined with memory tiering, see later for details) to avoid SLA degradation and ensure seamless virtual machine usage. Alternatively, in a database scenario, it is necessary to borrow memory and employ device semantics to implement a second-level cache to improve the performance of online transaction processing (OLTP). Furthermore, in a hybrid transactional / analytical processing (HTAP) database scenario, it is also necessary to borrow memory and employ device semantics to store and share column data.
[0124] Next, we will explain how the first node supports various presentation semantics.
[0125] Regarding NUMA semantics, as shown in Figure 9, after the first node borrows memory, since the semantic identifier indicates NUMA semantics, the proxy module of the first node establishes the aforementioned first correspondence, determines the NUMA set corresponding to the borrowed memory, and associates the borrowed memory with the NUMA set, allowing the borrowed memory to be used by the upper-layer application or OS as memory within the NUMA set. Optionally, associating the borrowed memory with the NUMA set can refer to establishing a third correspondence between the identifier of the NUMA set and the local physical address of the first node in the first correspondence. For the upper-layer application or OS, there is no need to be aware of the borrowed memory; it is sufficient to be aware of and access the NUMA set, thus presenting NUMA semantics. For example, when the upper-layer application or OS accesses the NUMA set, it queries the third correspondence based on the identifier of the NUMA set to obtain the local physical address of the first node, and then queries the first correspondence based on the local physical address of the first node to obtain the global address, thereby enabling remote access to the memory indicated by the global address.
[0126] The NUMA set can be an existing NUMA set or a newly created NUMA set. Since the borrowed memory is actually located on the second node, it may not correspond to the local processor of the first node (i.e., it has no CPU).
[0127] Optionally, the NUMA set supports two management methods. In one method, the borrowed memory in the NUMA set is managed by the system memory management module, allowing it to be used by upper-layer applications or the OS like local memory. In the other method, the borrowed memory in the NUMA set is managed by the memory hierarchy module, allowing it to be used by upper-layer applications or the OS differently from local memory. The system memory management module and the memory hierarchy module are, for example, two software components in the first node. Access to the borrowed memory is slower than to the local memory of the first node because accessing the borrowed memory involves communication between the first and second nodes. Therefore, the local memory of the first node can be designated as higher priority memory, hot data can be stored in higher priority memory, and the borrowed memory can be designated as lower priority memory, with cold data stored in lower priority memory, thus achieving memory hierarchy.
[0128] Regarding device semantics, as shown in Figure 10, after the first node borrows memory, since the semantic identifier indicates device semantics, the proxy module of the first node, after establishing the aforementioned first correspondence, determines the device corresponding to the borrowed memory. The device can be an existing device in the first node or a newly created device. The borrowed memory is associated with the device, for example, by establishing a fourth correspondence between the device identifier and the local physical address of the first node in the first correspondence, thus enabling the borrowed memory to be used as a device by upper-layer applications or the OS. For upper-layer applications or the OS, there is no need to be aware of the borrowed memory; they only need to be aware of the device and access it, thus presenting device semantics. For example, when an upper-layer application or the OS accesses the device, it queries the fourth correspondence based on the device identifier to obtain the local physical address of the first node. Based on the local physical address of the first node, it can query the first correspondence to obtain the global address, thereby enabling remote access to the memory indicated by the global address.
[0129] Optionally, the second node can borrow the same memory from different nodes. Each node associates the borrowed memory with its local device, enabling device sharing across different nodes. A file system can also be set on the device, further enabling file system sharing across different nodes. This achieves cross-node cache consistency maintenance for devices and file systems, and provides distributed locking capabilities to prevent conflicts during maintenance by different nodes. For example, if a first node writes a container image file to memory borrowed by a second node, that container image file can be shared by all nodes that borrowed that memory. Exemplarily, the global address of the shared memory can be a special identifier distinct from the global address of non-shared memory, allowing the management module and each node to differentiate shared memory based on this special identifier.
[0130] Regarding file semantics, as shown in Figure 11, after the first node borrows memory, since the semantic identifier indicates file semantics and the file depends on the device, the proxy module of the first node establishes the aforementioned first correspondence, determines the device (existing or newly created) corresponding to the borrowed memory, associates the borrowed memory with the device, determines the file system based on the corresponding device, and associates the borrowed memory with the file system. For example, a fifth correspondence is established between the device identifier, the file system identifier, and the local physical address of the first node in the first correspondence, thereby enabling the borrowed memory to be used as a file system by upper-layer applications or the OS. For upper-layer applications or the OS, there is no need to be aware of the borrowed memory; they only need to be aware of and access the file system, thus presenting file semantics. For example, when an upper-layer application or the OS accesses the file system, it queries the fifth correspondence based on the file system identifier to obtain the local physical address of the first node. Based on the local physical address of the first node, it can query the first correspondence to obtain the global address, thereby enabling remote access to the memory indicated by the global address.
[0131] In an exemplary embodiment, the method provided in this application further includes: canceling the allocation of memory for the second node to the first node, or in other words, returning the memory of the second node to the second node. In one example, the management module can start a timer after allocating memory for the second node to the first node, and automatically trigger the return of the memory to the second node after a certain time period. In another example, the management module can trigger the return of the memory to the second node when node metrics meet certain criteria (e.g., the number of non-free memory pages in the first node is less than the waterline). In yet another example, the return of the memory to the second node can be triggered by the sender of the first allocation event (see the description in step 401, which will not be repeated here). For example, if the sender of the first allocation event determines that the first node no longer needs to borrow memory, it generates a memory return event and sends the memory return event to the management module to trigger the return of the memory to the second node.
[0132] Optionally, the management module receives memory return events. These events may carry an event identifier and a first node identifier. The event identifier indicates that the memory return event is for returning memory. The management module can perform authorization authentication to determine whether the first node indicated by the first node identifier has the authority to return memory. This prevents third-party devices (i.e., devices other than the sender of the first allocation event) from impersonating the sender of the first allocation event to request memory return, thus ensuring the security and reliability of the memory return process.
[0133] For example, the management module returns memory to the second node, including: sending a first cancellation event to the first node, the first cancellation event carrying a global address, the first cancellation event being used to indicate the release of the memory indicated by the global address; receiving a response sent by the first node, the response indicating that the memory indicated by the global address has been released; generating a second cancellation event, the second cancellation event carrying a global address, the second cancellation event being used to release the memory indicated by the global address; and sending the second cancellation event to the second node.
[0134] As shown in Figure 12, the management module receives a memory return event. After successful authentication, it sends a first cancellation event to the proxy module of the first node. The proxy module of the first node receives the first cancellation event, releases the memory occupied by the globally specified address, and deletes the previously established first mapping. If the release and deletion are successful, the proxy module of the first node returns a response to the management module, indicating that the occupied borrowed memory has been released. If the release and deletion are unsuccessful, the proxy module of the first node returns a failure message to the management module. After receiving the failure message, the management module can wait a certain period before resending the first cancellation event. After receiving the response from the proxy module of the first node, the management module generates a second cancellation event, which carries the global address, and sends the second cancellation event to the proxy module of the second node. The second node receives the second cancellation event, deletes the previously established second mapping, and thus stops borrowing memory from the first node, thereby cancelling the allocation of the memory indicated by the globally specified address. Optionally, the second node can also delete the data stored in the borrowed memory. Optionally, the management module can update the above memory allocation relationship and memory allocation amount after sending a second allocation event to the proxy module of the second node.
[0135] For example, if the first node has already presented the borrowed memory according to the presentation semantics, the first node still needs to remove the presentation of the borrowed memory according to the presentation semantics before deleting the first correspondence.
[0136] For example, regarding NUMA semantics, if borrowed memory is occupied, the data in the borrowed memory is migrated to other memory locations. After confirming that the borrowed memory is not occupied, the association between the borrowed memory and the NUMA set is severed. This can be achieved by deleting the third-party mapping between the local physical address of the first node corresponding to the borrowed memory and the identifier of the NUMA set, thereby deprecating the representation of the borrowed memory according to NUMA semantics. Optionally, if the NUMA set is not associated with any memory, the NUMA set can also be deleted.
[0137] For example, regarding device semantics, after determining that the device associated with the borrowed memory is not occupied, the association between the borrowed memory and the device is severed. This can be achieved by deleting the fourth correspondence between the physical address of the first node corresponding to the borrowed memory and the device identifier, thereby realizing the removal of the representation of the borrowed memory based on device semantics. Optionally, "the device is not occupied" can mean that the application deletes the device by calling an interface, or that the device is automatically deleted by the OS after it has been unoccupied for a certain period of time.
[0138] For example, regarding file semantics, after determining that the file system associated with the borrowed memory is not occupied, the association between the borrowed memory and the file system is released. For example, the fifth correspondence between the device identifier, the file system identifier and the physical address of the first node in the first correspondence is deleted, thereby realizing the release of the presentation of the borrowed memory according to file semantics.
[0139] In summary, this embodiment of the application considers both topology and memory information when determining the second node to borrow memory from the first node. The consideration of topology information ensures that the second node is connected to the first node, enabling the first node to access the second node's memory normally. The consideration of memory information ensures that the second node matches the current memory usage, making it a suitable node among multiple nodes to borrow memory from the first node, thus avoiding any impact on the second node's own memory usage. Therefore, this embodiment of the application can accurately determine the second node in real time without the need for pre-reservation, offering greater flexibility. Furthermore, by rationally allocating memory among different nodes (i.e., borrowing and lending memory), the utilization rate of idle memory can be improved, achieving peak and valley memory allocation. The cluster's multiple nodes do not need to be configured with the largest memory capacity required at peak times, but rather with the average memory capacity required at peak and valley times. This not only avoids memory waste but also prevents data from being written to disk, ensuring better business performance. Furthermore, compared to the aforementioned related technology 1, this embodiment does not require data swapping in and out, ensuring business performance. Compared to the aforementioned related technology 2, this embodiment does not require additional memory nodes, reducing costs and preventing fault propagation, thus exhibiting higher reliability.
[0140] The above describes the memory allocation method provided by the embodiments of this application. Corresponding to the above method, the embodiments of this application also provide a memory allocation apparatus. This apparatus is applied to a management module. This apparatus is used to execute the memory allocation method shown in Figure 4 through the various modules shown in Figure 13. As shown in Figure 13, the memory allocation apparatus provided by the embodiments of this application includes the following modules.
[0141] The acquisition module 1301 is used to acquire the first node identifier, which indicates the first node among multiple nodes.
[0142] The acquisition module 1301 is also used to acquire topology information and memory information. The topology information indicates the connection relationship between at least some of the nodes among the multiple nodes, and the memory information indicates the memory usage of at least some of the nodes among the multiple nodes.
[0143] The acquisition module 1301 is also used to acquire the second node identifier based on the topology information and memory information, wherein the second node identifier indicates the second node among multiple nodes.
[0144] The allocation module 1302 is used to allocate memory for the second node to the first node.
[0145] For example, the memory information includes at least one of the following: memory allocation relationships between different nodes in at least some of the nodes, memory allocation amounts between different nodes, total memory of each of at least some of the nodes, used memory, unused memory, borrowed memory, or borrowed memory.
[0146] For example, the acquisition module 1301 is further configured to acquire latency information, which indicates the duration of the access process allowed by the first node, where the access process is the process by which the first node accesses the allocated memory. The acquisition module 1301 is also configured to acquire the identifier of the second node based on the latency information, topology information, and memory information, wherein the duration of the first node accessing the memory of the second node is less than the duration indicated by the latency information.
[0147] In an exemplary embodiment, the acquisition module 1301 is configured to receive a first allocation event, the first allocation event carrying a node identifier, the first allocation event being used to request memory allocation; and to determine the node identifier carried by the first allocation event as a first node identifier.
[0148] For example, the first allocation event also carries a semantic identifier, which indicates the presentation semantics and the presentation semantics indicate the presentation method of the allocated memory. The allocation module 1302 is used to send a second allocation event to the second node, which is used to indicate the allocation of memory; receive a global address sent by the second node, which indicates the address of the memory of the second node; generate a third allocation event, which carries a global address and a semantic identifier, and is used to allocate the memory of the second node and present the allocated memory according to the presentation method; and send the third allocation event to the first node.
[0149] In an exemplary embodiment, the allocation module 1302 is further configured to cancel the allocation of memory for the second node to the first node.
[0150] For example, the allocation module 1302 is configured to send a first cancellation event to the first node, the first cancellation event carrying a global address, the first cancellation event being used to indicate the release of the memory indicated by the global address; receive a response sent by the first node, the response indicating that the memory indicated by the global address has been released; generate a second cancellation event, the second cancellation event carrying a global address, the second cancellation event being used to release the memory indicated by the global address; and send a second cancellation event to the second node.
[0151] It should be understood that the beneficial effects of the device shown in Figure 13 in implementing its function are the same as those of the method shown in Figure 4, and will not be repeated here. Furthermore, the device shown in Figure 13 is only illustrated by the division of the above-mentioned functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the device and method embodiments provided in the above embodiments belong to the same concept, and their specific implementation process is detailed in the method embodiments, and will not be repeated here.
[0152] For example, embodiments of this application also provide a device for allocating memory, the device including a processor coupled to a memory, the memory storing at least one instruction, the at least one instruction being loaded and executed by the processor to cause the device for allocating memory to perform the memory allocation method shown in FIG4.
[0153] Referring to Figure 14, which shows a schematic diagram of an exemplary memory allocation device 1400 of this application, the memory allocation device 1400 includes at least one processor 1401, a memory 1403 and at least one network interface 1404.
[0154] Processor 1401 may be, for example, a general-purpose CPU, a digital signal processor (DSP), a network processor (NP), a GPU, a neural-network processing unit (NPU), a data processing unit (DPU), a microprocessor, or one or more integrated circuits or application-specific integrated circuits (ASICs), programmable logic devices (PLDs), other general-purpose processors or other programmable logic devices, discrete gates, transistor logic devices, discrete hardware components, or any combination thereof for implementing the scheme of this application. A PLD may be, for example, a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), generic array logic (GAL), or any combination thereof. A general-purpose processor may be a microprocessor or any conventional processor. It is worth noting that the processor may be a processor supporting an advanced reduced instruction set machine (RISC) machine (ARM) architecture. It can implement or execute the various logic blocks, modules, and circuits described in conjunction with the disclosure of this application. A processor can also be a combination of components that perform computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
[0155] Optionally, the memory allocation device 1400 also includes a bus 1402. The bus 1402 is used to transfer information between the components of the memory allocation device 1400. The bus 1402 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus 1402 can be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one line is used in Figure 14, but this does not mean that there is only one bus or one type of bus.
[0156] The memory 1403 may be, for example, volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache.
[0157] By way of example, but not limitation, many forms of ROM and RAM are available. For example, ROM is a compact disc read-only memory (CD-ROM). RAM includes, but is not limited to, static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct rambus RAM (DR RAM).
[0158] The memory 1403 can also be other types of storage devices capable of storing static information and instructions. Alternatively, it can be other types of dynamic storage devices capable of storing information and instructions. It can also be other optical disc storage, optical disk storage (including compressed optical discs, laser discs, optical discs, digital versatile optical discs, Blu-ray discs, etc.), magnetic disk storage media, or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures that can be accessed by a computer, but is not limited thereto. The memory 1403 may exist independently, for example, and be connected to the processor 1401 via bus 1402. The memory 1403 may also be integrated with the processor 1401.
[0159] Network interface 1404 uses any transceiver-like device for communicating with other devices or communication networks, such as Ethernet, radio access network (RAN), or wireless local area network (WLAN). Network interface 1404 can include wired network interfaces and wireless network interfaces. Specifically, network interface 1404 can be an Ethernet interface, such as Fast Ethernet (FE), Gigabit Ethernet (GE), Asynchronous Transfer Mode (ATM), WLAN, cellular network, or combinations thereof. The Ethernet interface can be an optical interface, an electrical interface, or a combination thereof. In some embodiments of this application, network interface 1404 can be used by the memory-allocated device 1400 to communicate with other devices.
[0160] In specific implementations, as some embodiments, processor 1401 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG14. Each of these processors may be a single-core processor or a multi-core processor. Here, processor may refer to one or more devices, circuits, and / or processing cores for processing data (e.g., computer program instructions).
[0161] In specific implementations, as some embodiments, the memory allocation device 1400 may include multiple processors, such as processor 1401 and processor 1405 shown in FIG. 14. Each of these processors may be a single-core processor or a multi-core processor. Here, a processor may refer to one or more devices, circuits, and / or processing cores for processing data (such as computer program instructions).
[0162] In some embodiments, memory 1403 is used to store program instructions 1410 for executing the present application scheme, and processor 1401 can execute the program instructions 1410 stored in memory 1403. That is, the memory-allocated device 1400 can implement the method provided in the method embodiment, i.e., the method shown in FIG. 4, through processor 1401 and program instructions 1410 in memory 1403. Program instructions 1410 may include one or more software modules. Optionally, processor 1401 itself may also store program instructions for executing the present application scheme.
[0163] In specific implementation, the memory allocation device 1400 of this application can correspond to the first network element device used to execute the above method. The processor 1401 in the memory allocation device 1400 reads the instructions in the memory 1403, so that the memory allocation device 1400 shown in FIG14 can execute all or part of the steps in the method embodiment.
[0164] The memory allocation device 1400 can also correspond to the device shown in FIG13 above, where each functional module is implemented in software using the memory allocation device 1400. In other words, the functional modules included in the device shown in FIG13 are generated by the processor 1401 of the memory allocation device 1400 reading the program instructions 1410 stored in the memory 1403.
[0165] In the method shown in Figure 4, each step is completed through integrated logic circuits in the hardware or instructions in the processor of the memory allocation device 1400. The steps of the method embodiments disclosed in this application can be directly implemented by the hardware processor, or by a combination of hardware and software modules in the processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. Since this storage medium is located in memory, the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method embodiments; to avoid repetition, these steps will not be described further here.
[0166] In one example, this application embodiment provides a memory allocation chip, the chip including a processor, the processor being used to call and execute instructions stored in the memory from the memory, causing a computer with the chip installed to perform the memory allocation method shown in FIG4.
[0167] In another example, this application embodiment also provides another chip for allocating memory. The chip includes an input interface, an output interface, a processor, and a memory. The input interface, output interface, processor, and memory are connected through an internal connection path. The processor is used to execute code in the memory. When the code is executed, the computer with the chip installed executes the memory allocation method shown in FIG4.
[0168] For example, embodiments of this application provide a computer-readable storage medium storing at least one instruction, which is loaded and executed by a processor to implement the memory allocation method shown in FIG4.
[0169] For example, embodiments of this application provide a computer program or computer program product, which includes computer programs or instructions that are executed by a processor to enable a computer to implement the memory allocation method shown in FIG4.
[0170] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the flow or function according to this application is generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., SSD), etc.
[0171] In this application, the terms "first," "second," etc., are used to distinguish identical or similar items with substantially the same function. It should be understood that there is no logical or temporal dependency between "first," "second," and "nth," nor does it limit the quantity or order of execution. It should also be understood that although the following description uses the terms "first," "second," etc., to describe various elements, these elements should not be limited by the terms. These terms are merely used to distinguish one element from another.
[0172] It should also be understood that, in the various embodiments of this application, the sequence number of each process does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0173] In this application, the term "at least one" means one or more, and the term "multiple" means two or more. For example, multiple second devices means two or more second devices. The terms "system" and "network" are often used interchangeably herein.
[0174] It should be understood that the terminology used in the description of the various examples herein is for the purpose of describing the particular examples only and is not intended to be limiting. As used in the description of the various examples and the appended claims, the singular forms “a” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise.
[0175] It should also be understood that the term "and / or" as used herein refers to and covers any and all possible combinations of one or more of the associated listed items. The term "and / or" describes an association between related objects, indicating that three relationships can exist; for example, A and / or B can represent: A alone, A and B simultaneously, or B alone. Additionally, the character " / " in this application generally indicates that the preceding and following related objects are in an "or" relationship.
[0176] It should also be understood that the terms “if” and “if” can be interpreted as meaning “when” or “upon”, or “in response to determination” or “in response to detection”. Similarly, depending on the context, the phrases “if determination…” or “if detection [the stated condition or event]” can be interpreted as meaning “when determination…”, or “in response to determination…”, or “when detection [the stated condition or event]” or “in response to detection [the stated condition or event]”.
[0177] The above are merely embodiments of this application and are not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the principles of this application should be included within the protection scope of this application.
Claims
1. A method for allocating memory, characterized in that, The method is applied to the management module corresponding to the cluster, the cluster comprising multiple nodes, and the method includes: Obtain the first node identifier, which indicates the first node among the plurality of nodes; Obtain topology information and memory information, wherein the topology information indicates the connection relationship between at least some of the plurality of nodes, and the memory information indicates the memory usage of at least some of the plurality of nodes; A second node identifier is obtained based on the topology information and the memory information, and the second node identifier indicates a second node among the plurality of nodes; Allocate memory for the second node to the first node.
2. The method according to claim 1, characterized in that, The memory information includes at least one of the following: the memory allocation relationship between different nodes in the at least some nodes, the amount of memory allocated between the different nodes, the total memory of each of the at least some nodes, the memory used, the memory not used, the memory borrowed, or the memory borrowed out.
3. The method according to claim 1 or 2, characterized in that, The method further includes: acquiring latency information, the latency information indicating the duration of the access process allowed by the first node, the access process being the process by which the first node accesses the allocated memory; The step of obtaining the second node identifier based on the topology information and the memory information includes: obtaining the second node identifier based on the latency information, the topology information and the memory information, wherein the memory consumption time of the first node accessing the second node is less than the duration indicated by the latency information.
4. The method according to any one of claims 1-3, characterized in that, Obtaining the first node identifier includes: Receive a first allocation event, the first allocation event carrying a node identifier, the first allocation event being used to request memory allocation; The node identifier carried by the first allocation event is determined as the first node identifier.
5. The method according to claim 4, characterized in that, The first allocation event also carries a semantic identifier, which indicates presentation semantics. The presentation semantics indicate the presentation method of the allocated memory. The allocation of memory of the second node to the first node includes: Send a second allocation event to the second node, the second allocation event being used to indicate memory allocation; Receive a global address sent by the second node, wherein the global address indicates the memory address of the second node; A third allocation event is generated, the third allocation event carrying the global address and the semantic identifier, the third allocation event being used to allocate memory for the second node and present the allocated memory according to the presentation method; Send the third allocation event to the first node.
6. The method according to any one of claims 1-5, characterized in that, The method further includes: Cancel the allocation of memory for the second node to the first node.
7. The method according to claim 6, characterized in that, The step of canceling the allocation of memory for the second node to the first node includes: Send a first cancellation event to the first node. The first cancellation event carries a global address and is used to indicate the release of the memory occupied by the global address. Receive a response sent by the first node, the response indicating that the memory occupied by the global address has been released; A second cancellation event is generated, the second cancellation event carrying a global address, and the second cancellation event is used to deallocate the memory indicated by the global address; Send the second cancellation event to the second node.
8. A device for allocating memory, characterized in that, The device includes: The acquisition module is used to acquire the first node identifier, which indicates the first node among multiple nodes; The acquisition module is also used to acquire topology information and memory information, wherein the topology information indicates the connection relationship between at least some of the plurality of nodes, and the memory information indicates the memory usage of at least some of the plurality of nodes; The acquisition module is further configured to acquire a second node identifier based on the topology information and the memory information, wherein the second node identifier indicates a second node among the plurality of nodes; The allocation module is used to allocate memory of the second node to the first node.
9. The apparatus according to claim 8, characterized in that, The memory information includes at least one of the following: the memory allocation relationship between different nodes in the at least some nodes, the amount of memory allocated between the different nodes, the total memory of each of the at least some nodes, the memory used, the memory not used, the memory borrowed, or the memory borrowed out.
10. The apparatus according to claim 8 or 9, characterized in that, The acquisition module is also used to acquire latency information, which indicates the duration of the access process allowed by the first node, and the access process is the process by which the first node accesses the allocated memory. The acquisition module is used to acquire the second node identifier based on the latency information, the topology information, and the memory information, wherein the memory consumption time of the first node accessing the second node is less than the duration indicated by the latency information.
11. The apparatus according to any one of claims 8-10, characterized in that, The acquisition module is configured to receive a first allocation event, the first allocation event carrying a node identifier, the first allocation event being used to request memory allocation; and to determine the node identifier carried by the first allocation event as the first node identifier.
12. The apparatus according to claim 11, characterized in that, The first allocation event also carries a semantic identifier, which indicates presentation semantics. The presentation semantics indicate the presentation method of the allocated memory. The allocation module is used to send a second allocation event to the second node, which indicates the allocation of memory; and to receive a global address sent by the second node, which indicates the address of the memory of the second node. A third allocation event is generated, the third allocation event carrying the global address and the semantic identifier, the third allocation event being used to allocate memory for the second node and present the allocated memory according to the presentation method; the third allocation event is sent to the first node.
13. The apparatus according to any one of claims 8-12, characterized in that, The allocation module is also used to cancel the allocation of memory for the second node to the first node.
14. The apparatus according to claim 13, characterized in that, The allocation module is used to send a first cancellation event to the first node. The first cancellation event carries a global address and is used to indicate the release of the memory occupied by the global address. Receive a response sent by the first node, the response indicating that the memory occupied by the global address has been released; Generate a second cancellation event, the second cancellation event carrying a global address, the second cancellation event being used to deallocate the memory indicated by the global address; send the second cancellation event to the second node.
15. A chip for allocating memory, characterized in that, The chip includes a processor for retrieving and executing instructions stored in memory, causing a computer with the chip installed to perform the memory allocation method according to any one of claims 1-7.
16. A chip for allocating memory, characterized in that, The chip includes an input interface, an output interface, a processor, and a memory. The input interface, the output interface, the processor, and the memory are connected through an internal connection path. The processor is used to execute code in the memory. When the code is executed, the computer with the chip installed performs the memory allocation method according to any one of claims 1-7.
17. A device for allocating memory, characterized in that, The device includes a processor coupled to a memory storing at least one instruction, which is loaded and executed by the processor to cause the memory allocation device to perform the memory allocation method according to any one of claims 1-7.
18. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one instruction, which is loaded and executed by a processor to implement the method of allocating memory as described in any one of claims 1-7.
19. A computer program product, characterized in that, The computer program product includes a computer program or instructions that are executed by a processor to enable a computer to implement the method of allocating memory as described in any one of claims 1-7.