Memory management method, input / output request scheduling method, and memory controller
By dynamically adjusting the queue depth based on the characteristics of input and output requests using the memory controller, the problem of unreasonable queue depth settings in storage devices is solved, achieving a balance between latency and throughput, and improving the performance and resource utilization efficiency of the storage system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HEFEI KAIMENG TECHNOLOGY CO LTD
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-30
AI Technical Summary
In the prior art, the queue depth setting of the storage device fails to distinguish the latency and throughput requirements of different input and output requests, resulting in increased queuing time for latency-sensitive requests or limited parallel processing capability for throughput-sensitive requests. Furthermore, the host system cannot adapt to changes in queue depth on the storage device side in a timely manner, leading to improper resource utilization.
The memory controller calculates throughput and latency bias based on the characteristics of input and output requests, dynamically adjusts the queue depth of the storage device, and feeds back to the host system through the queue depth descriptor to ensure that the request distribution strategy is consistent with the queue depth of the storage device.
It achieves a balance between latency control and throughput requirements under different load scenarios, avoids storage device overload or resource idleness, and improves the overall performance and efficiency of the storage system.
Smart Images

Figure CN122308734A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of storage system technology, and in particular to a memory management method, an input / output request scheduling method, and a memory controller for queue depth control of storage devices. Background Technology
[0002] In storage devices such as solid-state drives (SSDs), universal flash storage (UFS), and embedded multi-media cards (eMMCs), queue depth (QD) is a key factor affecting device performance. Queue depth represents the number of input / output (I / O) requests that the storage device can simultaneously accept and process.
[0003] In existing technologies, queue depth is typically set based on a preset fixed value or simply adjusted dynamically according to the overall system load level. However, these methods fail to differentiate the latency and throughput requirements of various input / output requests when determining queue depth. For example, when a storage device simultaneously handles latency-sensitive requests (such as random database read / write) and throughput-sensitive requests (such as sequential large file reads), a fixed queue depth cannot simultaneously accommodate the different performance requirements of these two types of requests: an excessively large queue depth increases the queuing time for latency-sensitive requests, while an excessively small queue depth limits the parallel processing capability of throughput-sensitive requests.
[0004] Furthermore, when the queue depth on the storage device side changes, the host system often fails to detect this change in a timely manner, causing the host system to continue sending input / output requests according to the original queue depth, which may result in queue overflow or insufficient resource utilization on the storage device side. Summary of the Invention
[0005] In view of this, this disclosure provides a memory management method and a memory controller, applied to a storage device configured with a memory controller and a memory module. This method can adaptively adjust the queue depth according to the actual characteristics of the currently received input / output requests, so as to balance latency control and throughput requirements under different load scenarios.
[0006] According to one aspect of this disclosure, a memory management method is provided, applied to a storage device configured with a memory controller and a memory module. In the method, the storage device receives multiple input / output requests from a host system. After receiving the multiple input / output requests, feature information is extracted from each of the multiple input / output requests, and a throughput tendency and a latency tendency are calculated for each input / output request based on the extracted feature information. The throughput tendency characterizes the degree to which the input / output request prefers data transmission throughput capabilities, and the latency tendency characterizes the degree to which the input / output request prefers low-latency responses. Based on the calculated throughput tendency and latency tendency, the queue depth of the storage device is adjusted to adapt the queue depth to the load characteristics of the current input / output requests.
[0007] According to another aspect of this disclosure, a memory controller is provided, suitable for a storage device configured with a memory module. The memory controller includes a memory interface control circuit and a processor. The memory interface control circuit is electrically connected to the memory module. The processor is electrically connected to the memory interface control circuit and configured to perform the following operations: receive a plurality of input / output requests from a host system; extract feature information of each input / output request from the plurality of input / output requests, and calculate a throughput tendency and a latency tendency for each input / output request based on the feature information; and adjust the queue depth of the storage device according to the throughput tendency and latency tendency. Thus, the memory controller is able to perform a bi-objective quantization evaluation based on the feature information of the input / output requests and dynamically control the queue depth of the storage device accordingly.
[0008] According to another aspect of this disclosure, an input / output request scheduling method is provided, applied to a host system communicatively connected to a storage device, wherein the storage device is configured with a memory controller and a memory module. In the method, the host system sends multiple input / output requests to the storage device and detects changes in the queue depth descriptor of the storage device, wherein the queue depth descriptor is dynamically updated by the storage device based on the throughput and latency tendencies of the multiple input / output requests. When a change in the queue depth descriptor is detected, the host system reads the updated queue depth value and adjusts the subsequent input / output request sending strategy according to the queue depth value. The sending strategy includes adjusting the number of input / output requests simultaneously sent to the storage device to ensure it does not exceed the queue depth value, thereby coordinating the request sending behavior of the host system with the queue depth adjustment result on the storage device side.
[0009] Based on the above, the memory management method, input / output request scheduling method, and memory controller provided in this disclosure establish a quantitative evaluation mechanism for each input / output request in terms of latency and throughput requirements by extracting feature information from input / output requests and calculating throughput and latency tendencies. This allows the adjustment of queue depth to no longer be limited to the overall system load level, but to reflect the actual load characteristics of the current input / output requests. When latency-sensitive input / output requests occur in clusters, the storage device can reduce the queue depth to reduce queuing time; when throughput-sensitive input / output requests dominate, the storage device can increase the queue depth to improve parallel processing capabilities. Therefore, compared to schemes that adjust queue depth based on a single load dimension, this disclosure can achieve targeted adjustment of queue depth across different load characteristics.
[0010] Furthermore, this disclosure establishes a feedback path between the storage device and the host system by updating the queue depth descriptor and sending a change notification to the host system after adjusting the queue depth on the storage device side. Upon receiving the change notification, the host system reads the updated queue depth value and adjusts its request distribution strategy accordingly. This ensures that the number of requests distributed by the host system matches the current queue depth of the storage device, avoiding storage device overload or resource idleness issues caused by the host system continuing to distribute requests at the original number due to its unawareness of queue depth changes. Attached Figure Description
[0011] Figure 1 This is a block diagram of a storage device and a host system according to an embodiment of the present disclosure;
[0012] Figure 2 This is a main program diagram of a memory management method according to an embodiment of the present disclosure;
[0013] Figure 3 This is a detailed flowchart of feature information extraction and dual-objective evaluation according to an embodiment of the present disclosure;
[0014] Figure 4 This is a flowchart illustrating the determination and execution of queue depth adjustment according to an embodiment of the present disclosure;
[0015] Figure 5 A flowchart illustrating the order-preserving determination and multi-queue allocation process for instruction execution order optimization according to an embodiment of this disclosure;
[0016] Figure 6 This is a flowchart illustrating the execution of a selection rule according to an embodiment of the present disclosure;
[0017] Figure 7 This is a flowchart illustrating the determination of rearrangement constraints according to an embodiment of the present disclosure;
[0018] Figure 8 This is a timing diagram of component interactions between a storage device and a host system according to an embodiment of the present disclosure;
[0019] Figure 9 This is a flowchart of a host-side input / output request scheduling method according to an embodiment of the present disclosure. Detailed Implementation
[0020] Reference will now be made in detail to exemplary embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same component reference numerals are used in the drawings and description to denote the same or similar parts.
[0021] Please refer to Figure 1 The host system 10 is, for example, a personal computer, a laptop computer, or a server. The host system 10 includes a processor 110 (also called a second processor), host memory 120 (also called host RAM), and a data transfer interface circuit 130. In this embodiment, the processor 110 is coupled (also called electrically connected) to the host memory 120 and the data transfer interface circuit 130. In another embodiment, the processor 110, host memory 120, and data transfer interface circuit 130 are electrically connected to each other via a system bus. In this embodiment, the processor 110, host memory 120, and data transfer interface circuit 130 may be located on the motherboard of the host system 10.
[0022] The storage device 20 includes a memory controller 210, a memory module 220 (also known as a rewritable non-volatile memory module), and a connection interface circuit 230. The memory controller 210 includes a processor 211 (also known as a first processor), a data management circuit 212, a memory interface control circuit 213, and a buffer memory 214.
[0023] In this embodiment, the host system 10 is electrically connected to the storage device 20 via a data transmission interface circuit 130 and a connection interface circuit 230 to perform data access operations. For example, the host system 10 can store data to or read data from the storage device 20 via the data transmission interface circuit 130.
[0024] In this embodiment, the number of data transmission interface circuits 130 can be one or more. Through the data transmission interface circuits 130, the motherboard can be electrically connected to the storage device 20 via wired or wireless means. The storage device 20 can be, for example, a USB flash drive, memory card, solid-state drive (SSD), or wireless storage device. The wireless storage device can be, for example, a Near Field Communication (NFC) storage device, a WiFi storage device, a Bluetooth storage device, or a Bluetooth Low Energy storage device (e.g., iBeacon), or other storage devices based on various wireless communication technologies. Furthermore, the motherboard can also be electrically connected via the system bus to various I / O devices such as a Global Positioning System (GPS) module, network interface card, wireless transmission device, keyboard, screen, and speaker.
[0025] In this embodiment, the data transmission interface circuit 130 and the connection interface circuit 230 are interface circuits compatible with the Peripheral Component Interconnect Express (PCI Express) standard. Furthermore, data transmission between the data transmission interface circuit 130 and the connection interface circuit 230 utilizes the Non-Volatile Memory Express (NVMe) communication protocol.
[0026] In another embodiment, the data transmission interface circuit 130 and the connection interface circuit 230 are interface circuits compatible with the Universal Flash Storage (UFS) standard, and data transmission between the data transmission interface circuit 130 and the connection interface circuit 230 is performed using the UFS communication protocol. In this case, the host system 10 and the storage device 20 exchange instructions and data via the UFS Protocol Information Unit (UPIU). It should be noted that the memory management method provided in this disclosure is not limited to a specific storage protocol, and those skilled in the art can apply it to other storage protocol environments provided that it conforms to the principles of the technical solution of this disclosure. Furthermore, in another embodiment, the connection interface circuit 230 may be packaged in a chip with the memory controller 210, or the connection interface circuit 230 may be disposed outside a chip containing the memory controller 210.
[0027] In this embodiment, the host memory 120 is used to temporarily store instructions or data executed by the processor 110. In this embodiment, the host memory 120 may be Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), etc. However, it should be understood that this disclosure is not limited to this, and the host memory 120 may also be other suitable memories.
[0028] The memory controller 210 is used to execute multiple logic gates or control instructions implemented in hardware or firmware, and to perform operations such as writing, reading and erasing data in the memory module 220 according to the instructions of the host system 10, and to execute the memory management method provided in this disclosure.
[0029] More specifically, the processor 211 in the memory controller 210 is hardware with computing capabilities, used to control the overall operation of the memory controller 210. Specifically, the processor 211 is programmed with multiple control instructions / program codes, and these control instructions / program codes are executed when the storage device 20 is operating to perform operations such as writing, reading, and erasing data.
[0030] Furthermore, processor 211 is configured to execute the memory management method provided in this disclosure. Specifically, processor 211 is configured to receive multiple input / output requests from host system 10. The input / output requests referred to herein are data read or data write instructions issued by host system 10 to storage device 20, with each input / output request corresponding to one data access operation on memory module 220. After receiving multiple input / output requests, processor 211 extracts feature information for each input / output request. The feature information referred to herein are quantitative indicators that characterize the load attributes of the input / output request, including but not limited to the request size, physical address, and priority indication information of the input / output request. Based on the feature information, processor 211 calculates the throughput tendency and latency tendency for each input / output request. The throughput tendency is a value between 0 and 1, used to characterize the extent to which the input / output request prioritizes improving data transmission throughput; a higher value indicates that the request is more suitable for parallel processing at a deeper queue depth to obtain a higher data transmission rate. The latency bias is also a value between 0 and 1, representing the extent to which an input / output request prioritizes reducing response latency. A higher value indicates that the request requires a shallower queue depth to reduce queuing time and achieve a shorter response time. Processor 211 adjusts the queue depth of storage device 20 based on the throughput bias and latency bias. The queue depth, as referred to here, is the maximum number of input / output requests that storage device 20 can accept and process at the same time. Its magnitude directly affects the parallel processing capability of storage device 20 and the queuing time of individual requests. Furthermore, processor 211 is configured to optimize instruction execution order based on the characteristics of input / output requests and update the queue depth descriptor of storage device 20 after adjusting the queue depth to report the change to host system 10. The queue depth descriptor, as referred to here, is a parameter field in storage device 20 used to record the current queue depth value, readable by host system 10. Host system 10 can determine the maximum number of concurrent requests currently allowed by storage device 20 by reading the queue depth descriptor.
[0031] It is worth mentioning that, in this embodiment, the processor 110 and the processor 211 are, for example, a central processing unit (CPU), a microprocessor, or other programmable processing units (microprocessor), digital signal processor (DSP), programmable controller, application specific integrated circuits (ASIC), programmable logic device (PLD), or other similar circuit components, and this disclosure is not limited thereto.
[0032] In this embodiment, as described above, the memory controller 210 further includes a data management circuit 212 and a memory interface control circuit 213. It should be noted that the operations performed by each component of the memory controller 210 can also be considered as operations performed by the memory controller 210 itself.
[0033] The data management circuit 212 is electrically connected to the processor 211, the memory interface control circuit 213, and the connection interface circuit 230. The data management circuit 212 receives instructions from the processor 211 to perform data transmission. For example, it reads data from the host system 10 (e.g., host memory 120) via the connection interface circuit 230 and writes the read data into the memory module 220 via the memory interface control circuit 213. Alternatively, it performs a read operation according to a read instruction from the host system 10, reading data from one or more physical units of the memory module 220 via the memory interface control circuit 213 and writing the read data into the host system 10 via the connection interface circuit 230.
[0034] In one embodiment, the data management circuit 212 also works with the processor 211 to receive and forward multiple input / output requests from the host system 10. Specifically, the data management circuit 212 receives input / output requests from the host system 10 via the connection interface circuit 230 and transmits the input / output requests to the processor 211, so that the processor 211 can extract the feature information of the input / output requests and perform subsequent queue depth adjustment and instruction execution order optimization operations. After the processor 211 determines the execution order of the target input / output requests based on throughput tendency and latency tendency, the data management circuit 212, according to the instructions of the processor 211, transmits the data corresponding to the target input / output requests to the memory module 220 for execution via the memory interface control circuit 213.
[0035] In another embodiment, the data management circuit 212 may also be integrated into the processor 211. The memory interface control circuit 213 is used to receive instructions from the processor 211 and cooperate with the data management circuit 212 to perform write (also known as programming) operations, read operations, or erase operations on the memory module 220.
[0036] Furthermore, data to be written to memory module 220 is converted into a format acceptable to memory module 220 via memory interface control circuit 213. Specifically, if processor 211 needs to access memory module 220, processor 211 transmits a corresponding instruction sequence to memory interface control circuit 213 to instruct memory interface control circuit 213 to perform the corresponding operation. For example, these instruction sequences may include write instruction sequences indicating the writing of data, read instruction sequences indicating the reading of data, erase instruction sequences indicating the erasure of data, and corresponding instruction sequences for indicating various memory operations. These instruction sequences may include one or more signals, or data on the bus. These signals or data may include instruction codes or program codes. For example, a read instruction sequence may include information such as the read identification code, memory address, and physical address.
[0037] Furthermore, the memory controller 210 establishes a logical-to-physical address mapping table and a physical-to-logical address mapping table to record the mapping relationship between the logical addresses of logical units (e.g., logical blocks, logical pages) and the physical addresses (physical addresses) of physical units (e.g., physical erase units / physical blocks, physical pages) configured for the memory module 220. In other words, the memory controller 210 can use the logical-to-physical address mapping table (also called the logical-to-physical mapping table) to find the physical unit mapped to a logical unit (e.g., find the physical page mapped to a logical page; find the physical address mapped to a logical address), and the memory controller 210 can use the physical-to-logical address mapping table (also called the physical-to-logical mapping table) to find the logical unit mapped to a physical unit (e.g., find the logical page mapped to a physical page; find the logical address mapped to a physical address).
[0038] The buffer memory 214 is electrically connected to the processor 211 and is used to temporarily store data and instructions from the host system 10, data from the memory module 220, and various system data for managing the storage device 20.
[0039] In this embodiment, the buffer memory 214 is also used to store queue depth management data and instruction scheduling data required by this disclosure. Specifically, the buffer memory 214 is used to temporarily store the current queue depth value and queue depth descriptor maintained by the processor 211 during the execution of the memory management method, as well as the feature information extracted by the processor 211 from multiple input / output requests and the throughput tendency and latency tendency calculated based on the feature information.
[0040] Furthermore, during instruction execution order optimization by the processor 211, the buffer memory 214 is also used to temporarily store input / output requests queued in multiple execution queues. The multiple execution queues referred to here are multiple first-in-first-out buffers categorized and stored by the processor 211 according to the characteristic information of the input / output requests. In embodiments of this disclosure, these include three types: high-priority queues, latency-sensitive queues, and throughput-sensitive queues. Specifically, the high-priority queue stores input / output requests explicitly marked as having the highest priority by the host system 10; the latency-sensitive queue stores input / output requests with a latency tendency higher than a throughput tendency; and the throughput-sensitive queue stores input / output requests with a throughput tendency higher than a latency tendency. The processor 211 selects a target input / output request from the multiple execution queues for execution according to a preset execution selection rule.
[0041] The memory module 220 is electrically connected to the memory controller 210 (specifically, electrically connected to the memory interface control circuit 213) and is used to store user data sent by the host system 10. In one embodiment, the memory module 220 serves as the data access object ultimately pointed to by multiple input / output requests. After adjusting the queue depth and optimizing the instruction execution order according to throughput and latency preferences, the processor 211 performs data write or data read operations corresponding to the target input / output requests on the memory module 220 via the memory interface control circuit 213.
[0042] In one embodiment, the memory cell structure of the memory module 220 can be understood as a multi-layered physical organization architecture. Specifically, the memory module 220 includes multiple chips, each chip has multiple planes, and each plane contains multiple physical blocks, each physical block consisting of multiple physical pages. It should be noted that this disclosure is not limited to the specific size of each physical page and logical page.
[0043] Through the above hardware architecture, the processor 211 in the memory controller 210 can perform dual-target quantization evaluation and instruction execution order optimization on input and output requests based on the feature information and execution queue data temporarily stored in the buffer memory 214, and execute the scheduled target request on the memory module 220 through the memory interface control circuit 213, while forming a feedback path with the host system 10 through the queue depth descriptor.
[0044] Reference Figure 2 In one embodiment, the memory management method provided in this disclosure is applied to a storage device 20 configured with a memory controller 210 and a memory module 220, and the processor 211 in the memory controller 210 acts as the execution entity. The memory management method includes steps S210 to S230.
[0045] In step S210, the processor 211 receives multiple input / output requests from the host system 10. Specifically, during the execution of an application, the host system 10 generates data read requests or data write requests for the storage device 20. These requests are transmitted to the memory controller 210 via the data transmission interface circuit 130 and the connection interface circuit 230. The processor 211 receives the multiple input / output requests via the data management circuit 212 and temporarily stores them in the buffer memory 214 for subsequent processing. In one embodiment, the multiple input / output requests may originate from different applications or different data streams running simultaneously on the host system 10, and therefore, the multiple input / output requests may exhibit different characteristics in terms of request size, physical address distribution, and priority.
[0046] Next, in step S220, the processor 211 extracts feature information for each input / output request from the multiple input / output requests, and calculates the throughput tendency and latency tendency for each input / output request based on the feature information. Specifically, for each input / output request temporarily stored in the buffer memory 214, the processor 211 reads feature information that characterizes the load attributes of the input / output request. Based on the extracted feature information, the processor 211 calculates two-dimensional tendency values for each input / output request according to a preset evaluation rule: throughput tendency and latency tendency. The throughput tendency reflects the degree to which the input / output request is inclined towards data transmission throughput capability, and the latency tendency reflects the degree to which the input / output request is inclined towards low-latency response. Thus, the processor 211 can convert the load characteristics of each input / output request from a qualitative categorical description into a quantitative two-dimensional numerical representation.
[0047] Further, in step S230, the processor 211 adjusts the queue depth of the storage device 20 based on the throughput tendency and latency tendency. Specifically, the processor 211 determines whether the overall load characteristics of the current input / output requests are biased towards throughput sensitivity or latency sensitivity based on the throughput tendency and latency tendency of each input / output request calculated in step S220, and adjusts the current queue depth of the storage device 20 accordingly. When the latency tendency of multiple input / output requests is generally high, the processor 211 reduces the queue depth to reduce the queuing time of each input / output request within the storage device 20; when the throughput tendency of multiple input / output requests is generally high, the processor 211 increases the queue depth to improve the parallel processing capability of the storage device 20 for multiple input / output requests. The processor 211 updates the adjusted queue depth value in the buffer memory 214 for maintenance.
[0048] Through the processes described in steps S210 to S230, processor 211 implements a complete process of performing dual-target quantization evaluation based on feature information from input / output requests, and adjusting the queue depth of storage device 20 online according to the evaluation results. The specific extraction method of feature information and the calculation method of tendency in step S220, as well as the specific adjustment logic of queue depth in step S230, will be discussed in conjunction with... Figure 3 and Figure 4 Further details will be provided in subsequent embodiments.
[0049] Reference Figure 3 , Figure 3 The process shown is... Figure 2 Further refinement of step S220. In one embodiment, the process by which the processor 211 extracts feature information of each input / output request and calculates throughput tendency and latency tendency includes steps S310 to S360.
[0050] In step S310, the processor 211 acquires the request size, physical address, and priority indication information of the input / output request. Specifically, in one embodiment, the feature information includes three dimensions: first, the request size of the input / output request, i.e., the amount of data to be read or written by the request; second, the physical address of the input / output request, i.e., the target storage location in the memory module 220 pointed to by the request; and third, the priority indication information of the input / output request, i.e., an identifier used to characterize the urgency or importance of the request. The processor 211 reads the feature information of the above three dimensions from each input / output request temporarily stored in the buffer memory 214 for subsequent score calculation.
[0051] In one embodiment, after acquiring feature information in three dimensions, the processor 211 performs quantization processing on each dimension to convert the original feature information into normalized score values. Steps S320, S330, and S340 can be executed in parallel or in any order.
[0052] In step S320, processor 211 normalizes the request size to obtain a size score. The size score is a value between 0 and 1, representing the relative size of the input / output request data. In one embodiment, processor 211 uses a segmented mapping rule to normalize the request size: when the request size is less than or equal to 32KB, the size score is mapped to 0; when the request size is between 32KB and 512KB, the size score is linearly mapped between 0 and 1 according to the relative position of the request size within the 32KB to 512KB range; when the request size is greater than or equal to 512KB, the size score is mapped to 1. The mapping rule remains continuous at each segment boundary: when the request size is exactly 32KB, the calculation result of the linear mapping segment is 0, consistent with the mapping value of the small segment; when the request size is exactly 512KB, the calculation result of the linear mapping segment is 1, consistent with the mapping value of the large segment. Through the above normalization process, request sizes of different magnitudes are uniformly mapped to the same numerical range without any step discontinuities, so that subsequent weighted calculations can be performed on the scores of other dimensions.
[0053] In another embodiment, the specific threshold and mapping range of the segmented mapping can be adjusted according to the hardware characteristics and application scenarios of the storage device 20, and this disclosure is not limited to the specific values mentioned above.
[0054] In step S330, processor 211 determines the number of consecutive occurrences of adjacent input / output requests based on the physical address, and obtains a sequential score value based on the percentage of consecutive occurrences. The sequential score value is a value between 0 and 1, used to characterize the degree of continuity of the current batch of input / output requests in terms of physical address.
[0055] Specifically, in one embodiment, the processor 211 checks the physical address relationship of two adjacent input / output requests one by one within a sliding window. In one embodiment, if the difference between the starting logical block address of the i-th input / output request and the ending logical block address of the (i-1)-th input / output request is within a preset continuity threshold (e.g., 128KB), the processor 211 records this adjacent relationship as a sequential hit. The processor 211 counts the number of sequential hits within the sliding window and divides the number of sequential hits by the total number of comparisons within the sliding window; the resulting ratio is the sequentiality score. For example, if there are 10 adjacent comparisons within the sliding window, and 8 of them satisfy the continuity condition, the sequentiality score is 0.8, indicating that the current request flow has high physical address continuity.
[0056] In step S340, the processor 211 obtains a priority score value based on the priority indication information of the input / output request. The priority score value is a numerical value between 0 and 1, used to characterize the urgency of the input / output request; a higher value indicates a higher priority. In one embodiment, the priority score value is determined based on at least one of the following: a priority tag carried by the input / output request, and a priority determined based on the historical access pattern of the data stream to which the input / output request belongs.
[0057] Specifically, the first approach is to directly read the priority flag carried in the input / output request. The priority flag is a priority field attached by the host system 10 when it sends a request to the storage device 20 through the storage protocol. The processor 211 directly determines the priority score based on the value of the priority flag.
[0058] The second approach is for the processor 211 to infer the priority of the current input / output request based on the historical access patterns of the data stream to which the input / output request belongs. Specifically, in one embodiment, the processor 211 maintains historical access records for each data stream and infers the priority of the current input / output request for that data stream based on the access patterns presented in the historical access records.
[0059] In one embodiment, if a data stream exhibits a small-block random read access pattern 50 times consecutively, the processor 211 determines the priority score of the current input / output request for that data stream as 0.6; if the consecutive number is 20, the priority score is determined as 0.3; if the consecutive number does not meet the above conditions, the priority score is determined as 0.1. In another embodiment, when an input / output request simultaneously possesses a carried priority flag and a priority inferred based on historical access patterns, the processor 211 may select the higher of the two values, or perform a weighted fusion of the two to determine the final priority score.
[0060] After completing the score calculation for the above three dimensions, the processor 211 calculates the throughput tendency and latency tendency based on the size score, the order score, and the priority score through two different weighted combinations.
[0061] In step S350, the processor 211 calculates the throughput tendency using a first weighted combination. The throughput tendency is calculated based on a first weighted combination of size score, order score, and priority score. In the first weighted combination, size score and order score participate in the calculation with positive weights, while priority score participates with its negative weight. The technical implication is that: a larger request size indicates that the request is more suitable for increasing data transmission rate by increasing parallelism, thus contributing positively to the throughput tendency; more contiguous physical addresses indicate that the request is more suitable for improving bandwidth utilization through sequential access, thus also contributing positively to the throughput tendency; and higher priority requests are generally more sensitive to latency, therefore the priority score participates in the throughput tendency calculation with its negative weight. In one embodiment, the calculation formula for the first weighted combination is expressed as:
[0062]
[0063] in For swallowing tendency, This is a size rating. The score is an ordinal rating. This is the priority score. , , The weight coefficients of the first weighted combination and satisfying In one embodiment, , , .
[0064] In step S360, the processor 211 calculates the latency tendency using a second weighted combination. The latency tendency is calculated based on a second weighted combination of size score, order score, and priority score. In the second weighted combination, the priority score participates in the calculation with a positive weight, while the size score and order score participate with their respective negative values. The technical implication is that: higher priority requests have a stronger need for low latency, thus contributing positively to the latency tendency; smaller request sizes indicate small-block random access, making them more sensitive to queuing latency, so the size score participates in the calculation with its negative value, resulting in a higher latency tendency for smaller requests; more discontinuous physical addresses indicate random access, requiring a reduction in queue depth to decrease queuing time, so the order score participates in the calculation with its negative value. In one embodiment, the calculation formula for the second weighted combination is expressed as:
[0065]
[0066] in For the degree of delay, , , The weight coefficients of the second weighted combination and satisfying In one embodiment, , , The weighting coefficient of the priority score in the second weighted combination. Higher than the corresponding weight coefficient in the first weighted combination This reflects the technical characteristic that priority factors have a greater impact on latency requirements than on throughput requirements.
[0067] To further illustrate the specific numerical effects of the above dual-objective evaluation process, two calculation examples are given below.
[0068] The first example is a 512KB sequential read request, with a size score of 1.0, a sequence score of 0.9, and a priority score of 0.1. Substituting these values into the formula above yields the throughput tendency. Delay tendency The results indicate that the sequential read request has a throughput tendency close to 1 and a latency tendency close to 0, meaning that the request is classified as a throughput-sensitive request in the bi-objective evaluation.
[0069] The second example is a 4KB random read request with a size score of 0, a sequence score of 0.1, and a priority score of 0.6. Substituting these values into the formula yields the throughput tendency. Delay tendency The results show that the random read request has a low throughput tendency (0.155) and a high latency tendency (0.785), meaning that the request is classified as a latency-sensitive request in the dual-objective evaluation. These two examples illustrate that the complementary weight design of the first and second weighted combinations can differentiate input and output requests with different load characteristics across both throughput and latency tendencies, providing a quantitative basis for queue depth adjustment in subsequent step S230.
[0070] Through the process of steps S310 to S360 above, the processor 211 establishes a complete transformation path from the original request attributes to the two-dimensional quantitative score, so that the input and output requests with different load characteristics can obtain a distinguishable numerical representation in the two dimensions of throughput tendency and latency tendency, providing quantitative input for the judgment logic of queue depth adjustment and instruction execution order optimization in subsequent steps.
[0071] Reference Figure 4 , Figure 4 The process shown is... Figure 2 Further details of step S230. In one embodiment, the process by which the processor 211 adjusts the queue depth of the storage device 20 according to the throughput tendency and latency tendency includes steps S410 to S460, which will be described one by one below.
[0072] In step S410, processor 211 acquires throughput tendency and latency tendency. Specifically, processor 211 reads data from buffer memory 214... Figure 3 The throughput tendency and latency tendency of each input / output request calculated in steps S350 and S360 are shown.
[0073] In one embodiment, the processor 211 may perform statistical summarization (e.g., take the average or take the weighted average) on the throughput tendency and latency tendency of the current batch of input and output requests to obtain the throughput tendency and latency tendency that characterize the current overall load characteristics, which are then used for subsequent threshold determination.
[0074] Next, in step S420, the processor 211 determines whether the latency tendency is greater than or equal to a first preset threshold, and whether the difference between the latency tendency and the throughput tendency is greater than or equal to a second preset threshold. The first preset threshold is used to determine whether the current load tendency has reached a level that requires triggering queue depth adjustment, and the second preset threshold is used to determine whether the distinction between the two tendencies is sufficient to support directional adjustment, avoiding unnecessary adjustments when the two tendencies are close. In one embodiment, the first preset threshold is 0.70, and the second preset threshold is 0.10.
[0075] When the judgment result of step S420 is "yes", that is, when the latency tendency is greater than or equal to the first preset threshold and the difference between the latency tendency and the throughput tendency is greater than or equal to the second preset threshold, it indicates that the load characteristics of the current input / output request are clearly biased towards latency sensitivity. In this case, the process proceeds to step S430, and the processor 211 reduces the queue depth by a first step value. The first step value refers to the adjustment amount by which the processor 211 reduces the queue depth each time. By reducing the queue depth, the number of input / output requests received by the storage device 20 at the same time is reduced, and the queuing waiting time of each input / output request within the storage device 20 is correspondingly shortened, thereby helping to reduce the response time of latency-sensitive requests.
[0076] When the judgment result of step S420 is "no", the process proceeds to step S440. In step S440, the processor 211 further determines whether the throughput tendency is greater than or equal to the first preset threshold, and whether the difference between the throughput tendency and the latency tendency is greater than or equal to the second preset threshold.
[0077] When the judgment result of step S440 is "yes", that is, when the throughput tendency is greater than or equal to the first preset threshold and the difference between the throughput tendency and the latency tendency is greater than or equal to the second preset threshold, it indicates that the load characteristics of the current input / output request are clearly biased towards throughput sensitivity. In this case, the process proceeds to step S450, and the processor 211 increases the queue depth by a second step value. The second step value refers to the adjustment range by which the processor 211 increases the queue depth each time. By increasing the queue depth, the storage device 20 can simultaneously accept more input / output requests for parallel processing, thereby improving the data transmission rate of throughput-sensitive requests.
[0078] When the judgment result of step S440 is "No", it means that the load characteristics of the current input / output request do not meet either the triggering conditions for latency-sensitive or throughput-sensitive requests. This situation corresponds to a scenario where both throughput tendency and latency tendency have not reached the first preset threshold, or where both have reached the first preset threshold but the difference has not reached the second preset threshold. For example, when both throughput tendency and latency tendency are between 0.40 and 0.65, it indicates that the current load exhibits mixed characteristics and its directionality is unclear. In this case, the process proceeds to step S460, and the processor 211 maintains the current queue depth unchanged to avoid frequent adjustments when the load characteristics are unclear.
[0079] After step S430, step S450 or step S460 is completed, the queue depth adjustment is completed, and the processor 211 updates the adjusted queue depth value (or the current value that remains unchanged) to the buffer memory 214 for maintenance.
[0080] It is worth mentioning that, in one embodiment, the first step value is greater than the second step value, forming an asymmetric stepping strategy of "fast decrease and slow increase". The technical consideration for this design is that when latency-sensitive requests occur in a concentrated manner, an excessively high queue depth will directly lead to an increase in the queuing waiting time for each request. Therefore, it is necessary to quickly reduce the queue depth with a larger step value to control latency. However, when throughput-sensitive requests dominate, the increase in queue depth has a gradual effect on throughput, and increasing the queue depth too quickly may cause a sudden increase in latency when the load characteristics change. Therefore, a smaller step value is used to conservatively increase the queue depth.
[0081] In one embodiment, the first step value (decrease) is 2 to 4, and the second step value (increase) is 1. Furthermore, the first step value can also be associated with the size of the current input / output request: when small random read requests occur in a concentrated manner, the processor 211 uses a larger first step value to accelerate the reduction of the queue depth; in scenarios where the queue depth increases, since large requests can already utilize the bandwidth of the storage device 20, the processor 211 uses a smaller second step value.
[0082] In another embodiment, the queue depth adjustment is limited to a preset queue depth range. Specifically, when the processor 211 reduces the queue depth in step S430, the queue depth adjustment is limited to a first queue depth range; when the processor 211 increases the queue depth in step S450, the queue depth adjustment is limited to a second queue depth range. The lower limit of the second queue depth range is greater than the upper limit of the first queue depth range, meaning the two queue depth ranges do not overlap numerically. In one embodiment, the lower limit of the first queue depth range is 1 to 2, and the upper limit is 4; the lower limit of the second queue depth range is 8, and the upper limit is 32. The upper limit of the first queue depth range is 4, while the lower limit of the second queue depth range is 8, and there is no overlapping interval between them. This design ensures that in latency-sensitive scenarios, the queue depth is limited to a lower range to guarantee low latency, while in throughput-sensitive scenarios, the queue depth is limited to a higher range to guarantee parallel processing capability. The interval between the two ranges (5 to 7 in the above example) corresponds to the hybrid scenario of maintaining the current queue depth in step S460, within which the processor 211 does not actively initiate adjustments.
[0083] Through the process of steps S410 to S460 above, the processor 211 realizes the directional adjustment of the queue depth: rapidly reducing the queue depth to shorten the queuing waiting time under latency-sensitive load, conservatively increasing the queue depth to improve parallel processing capability under throughput-sensitive load, and maintaining the current queue depth to avoid frequent adjustments under mixed load, and the adjustment results in each direction are limited by the corresponding queue depth range.
[0084] Reference Figure 5 In one embodiment, the processor 211 completes Figure 2 After adjusting the queue depth in step S230, the instruction execution order of multiple input / output requests is further optimized. The instruction execution order optimization process includes steps S510 to S570.
[0085] In step S510, the processor 211 determines whether the physical address sequence of the current input / output request satisfies a preset order condition. The preset order condition is used to determine whether the input / output request belongs to a sequential request with consecutive physical addresses. In one embodiment, the processor 211 determines the sequence based on... Figure 3 The sequentiality score calculated in step S330 is used for determination: when the sequentiality score of the request stream in which the input / output request is located is higher than the preset sequentiality threshold, the processor 211 determines that the physical address sequentiality of the input / output request meets the preset sequentiality condition.
[0086] In another embodiment, the processor 211 may also make a comprehensive judgment based on the request size. For example, when the request size is greater than a preset size threshold and the order score is higher than the order threshold, the processor determines that the input / output request meets the preset order condition.
[0087] When the judgment result of step S510 is "yes", the process proceeds to step S520. In step S520, the processor 211 maintains the original instruction order of the input / output request and does not adjust its execution order. The technical consideration for this processing is that sequential input / output requests with contiguous physical addresses (such as large block sequential read / write requests) can utilize the sequential access characteristics of the memory module 220 when executed in their original order. Reordering them may disrupt the contiguousness of physical addresses, causing the memory module 220 to generate additional random addressing operations. Therefore, the processor 211 directly sends input / output requests that meet the preset order conditions to the memory module 220 for execution in their original order via the memory interface control circuit 213.
[0088] When the judgment result of step S510 is "no", it indicates that the physical address sequence of the input / output request does not meet the preset order condition, and the request belongs to a non-sequential request (e.g., a small-size random read / write request). For such requests, the processor 211 allocates them to multiple execution queues according to the feature information, so that the target input / output request can be selected and executed from the multiple execution queues according to the preset execution selection rules. The multiple execution queues include a high-priority queue, a latency-sensitive queue, and a throughput-sensitive queue. The processor 211 determines which execution queue each non-sequential input / output request should be allocated to through the two-layer queuing judgment rules constituted by steps S530 and S550.
[0089] Specifically, in the first-level determination, the process proceeds to step S530, where the processor 211 determines whether the priority of the input / output request meets a preset high-priority condition. The high-priority condition is used to identify input / output requests that require the highest priority processing. In one embodiment, the high-priority condition refers to a foreground critical request explicitly marked as having the highest priority by the host system 10 through a storage protocol when issuing the input / output request, such as a request carrying a specific priority flag.
[0090] When the judgment result of step S530 is "yes", the process proceeds to step S540, and the processor 211 assigns the input / output request to the high-priority queue. The input / output request in the high-priority queue will receive the highest priority processing order in the subsequent execution selection process.
[0091] When the result of step S530 is "No", it means that the priority of the input / output request does not meet the high-priority condition. The process proceeds to the second level of judgment, namely step S550, where processor 211 determines whether the latency tendency of the input / output request is higher than the throughput tendency. Processor 211 reads the data from buffer memory 214... Figure 3 The throughput tendency and latency tendency calculated for the input / output request in steps S350 and S360 are shown, and the two are compared.
[0092] When the judgment result of step S550 is "yes", that is, the latency tendency is higher than the throughput tendency, it means that the input / output request is more inclined to a low-latency response in the dual-objective evaluation. The process proceeds to step S560, where processor 211 allocates the input / output request to the latency-sensitive queue. Input / output requests allocated to the latency-sensitive queue will receive a higher priority in the subsequent execution selection process compared to requests in the throughput-sensitive queue, in order to ensure their latency requirements.
[0093] When the judgment result of step S550 is "No", that is, the throughput tendency is higher than or equal to the latency tendency, it means that the input / output request is more inclined to improve data transmission throughput in the dual-objective evaluation. The process proceeds to step S570, where the processor 211 allocates the input / output request to the throughput-sensitive queue. The input / output requests allocated to the throughput-sensitive queue will be organized into batch execution in the subsequent execution selection process to utilize the parallel processing capability of the storage device 20 to improve the overall data transmission rate.
[0094] Through the process of steps S510 to S570 described above, processor 211 implements a two-level instruction execution order optimization strategy.
[0095] Specifically, in the first level, the processor 211 separates sequential requests with contiguous physical addresses from non-sequential requests, and maintains the original instruction order for sequential requests to preserve sequential access characteristics.
[0096] In the second level, the processor 211 further allocates non-sequential requests to three execution queues according to a two-level queuing decision rule: requests that meet the priority condition for high priority enter the high priority queue, requests with latency tendency higher than throughput tendency enter the latency sensitive queue, and the remaining requests enter the throughput sensitive queue.
[0097] It should be noted that the specific execution selection rules for input and output requests in the three execution queues will be combined with... Figure 6 In subsequent embodiments, it will be further explained that the security constraints for adjusting the execution order of input and output requests will be combined with... Figure 7 Further details will be provided in subsequent embodiments.
[0098] Reference Figure 6 In one embodiment, the processor 211 passes through Figure 5 Steps S510 to S570, as shown, allocate non-sequential input / output requests to the high-priority queue, latency-sensitive queue, and throughput-sensitive queue, and then select target input / output requests from multiple execution queues for execution according to preset execution selection rules. The determination process of the execution selection rules includes steps S610 to S680, which will be explained one by one below.
[0099] In step S610, processor 211 determines whether the high-priority queue is not empty. The high-priority queue stores... Figure 5 In step S540, the input / output requests that meet the high-priority condition are allocated to the process. Processor 211 first checks whether there are any input / output requests waiting to be executed in the high-priority queue maintained in buffer memory 214.
[0100] When the judgment result of step S610 is "yes", that is, there is an input / output request waiting to be executed in the high-priority queue, the process proceeds to step S620. In step S620, the processor 211 preferentially selects the target input / output request from the high-priority queue. Thus, the input / output request explicitly marked as the highest priority by the host system 10 can obtain the highest priority execution opportunity, regardless of the queuing status in the other two execution queues.
[0101] When the judgment result of step S610 is "no", that is, the high-priority queue is empty, the process proceeds to step S630. In step S630, processor 211 detects whether there are input / output requests in the latency-sensitive queue whose waiting time exceeds a preset time threshold. The preset time threshold is used to identify requests that have been waiting for too long in the latency-sensitive queue, and if these requests continue to wait, their response delay will exceed an acceptable range. The waiting time referred to here is the duration experienced by the input / output request from entering the latency-sensitive queue to the current moment. In one embodiment, the preset time threshold is 1 millisecond. When the waiting time of an input / output request in the latency-sensitive queue exceeds the preset time threshold, the request has the opportunity to have its execution priority increased through a timeout mechanism, which is referred to in this disclosure as timeout priority escalation.
[0102] When the judgment result of step S630 is "yes", that is, there is an input / output request in the delay-sensitive queue whose waiting time exceeds a preset time threshold, the process proceeds to step S640. In step S640, the processor 211 selects a target input / output request from the delay-sensitive queue. In one embodiment, when there are multiple input / output requests exceeding the preset time threshold in the delay-sensitive queue, the processor 211 selects the request with the longest waiting time as the target input / output request.
[0103] When the judgment result of step S630 is "no", that is, the high-priority queue is empty and there are no input / output requests exceeding the preset time threshold in the delay-sensitive queue, the process proceeds to step S650. In step S650, the processor 211 selects a target input / output request from the throughput-sensitive queue. Thus, when there are no urgent requests to process in either the high-priority queue or the delay-sensitive queue, the processor 211 allocates execution opportunities to requests in the throughput-sensitive queue to utilize the parallel processing capability of the storage device 20 to improve the data transmission rate.
[0104] After step S620 or step S640 is completed, the process proceeds to step S660. In step S660, the processor 211 determines whether the number of consecutive input / output requests selected from the high-priority queue or the delay-sensitive queue has reached a preset number. The preset number is used to control the number of consecutive times the high-priority queue and the delay-sensitive queue occupy execution opportunities, avoiding starvation caused by long-term non-execution of input / output requests in the throughput-sensitive queue. This disclosure refers to this mechanism as an anti-starvation mechanism.
[0105] When the judgment result of step S660 is "yes", that is, the processor 211 has continuously selected a preset number of input / output requests from the high-priority queue or the latency-sensitive queue for execution, the process proceeds to step S670. In step S670, the processor 211 selects at least one input / output request from the throughput-sensitive queue for execution. Thus, even in scenarios where latency-sensitive loads continue to arrive, input / output requests in the throughput-sensitive queue can still obtain periodic execution opportunities, avoiding the request starvation problem caused by scheduling bias. After step S670 is completed, the processor 211 resets the continuous selection count to zero, and subsequent execution selection starts again from step S610.
[0106] When the judgment result of step S660 is "no", that is, the number of consecutive selections has not yet reached the preset number, the process directly proceeds to step S680.
[0107] In step S680, processor 211 executes the target input / output request. Specifically, processor 211 processes the target input / output request selected in steps S620, S640, S650, or S670 via... Figure 7 After the rearrangement constraint check is performed, the corresponding data read operation or data write operation is performed on the memory module 220 through the data management circuit 212 and the memory interface control circuit 213.
[0108] Through the processes described in steps S610 to S680, the execution selection rule establishes a three-tiered scheduling priority: the first tier consists of requests in the high-priority queue, which always receive the highest priority execution opportunity; the second tier consists of requests in the latency-sensitive queue that have timed out and received priority due to waiting times exceeding a preset time threshold; and the third tier consists of requests in the throughput-sensitive queue, which receive execution opportunities when there are no urgent requests in the first two tiers. Simultaneously, the anti-starvation mechanism ensures scheduling fairness among the three execution queues by forcibly selecting at least one request from the throughput-sensitive queue after each consecutive selection of a preset number of high-priority or latency-sensitive requests.
[0109] Through the above execution selection rules, the processor 211 establishes a selection mechanism that takes into account both response priority and scheduling fairness among the three execution queues: high-priority requests always get the highest priority for execution, latency-sensitive requests avoid waiting for too long through a timeout priority escalation mechanism, and throughput-sensitive requests get periodic execution opportunities through an anti-starvation mechanism. The three achieve dynamic balance under different load scenarios.
[0110] Reference Figure 7 In one embodiment, the processor 211 passes through Figure 5 The multi-queue allocation shown and Figure 6 After the execution selection rules shown determine the target input / output requests, a rearrangement constraint check is required before actually adjusting the execution order of the target input / output requests. The rearrangement constraint ensures that adjusting the execution order does not violate data integrity and the correctness of instruction semantics. The process of determining the rearrangement constraint includes steps S710 to S770, which are described one by one below.
[0111] In step S710, the processor 211 acquires the target input / output request to be executed and its characteristic information. Specifically, the processor 211 reads the data from the buffer memory 214 via... Figure 6 The execution selection rules shown indicate the target input / output requests selected, and in Figure 3 The step S310 shown extracts the feature information of the request (including request size, physical address, and priority indication information). The processor 211 will determine, based on the above information, whether the execution order adjustment of the target input / output request is subject to the restrictions of the no-boundary rule and the maximum jump range.
[0112] Next, in step S720, the processor 211 determines whether adjusting the execution order of the target input / output request will cross a barrier command or change the execution order of read / write requests with data dependencies. Barrier commands, as referred to here, are commands issued by the host system 10 with forced flush semantics (e.g., flush commands, forced cell access commands, or barrier commands). These commands require all preceding input / output requests to be completed before subsequent input / output requests can begin execution. Data dependencies, as referred to here, refer to a write-before-read or read-before-write relationship between two input / output requests to the same physical address. Adjusting the execution order of such requests may result in reading expired data or overwriting unread data. These two scenarios together constitute the no-boundary-crossing rule defined in this disclosure.
[0113] When the judgment result of step S720 is "yes," meaning that adjusting the execution order of the target input / output request would cross barrier-type commands or change the execution order of read / write requests with data dependencies, the process proceeds to step S730. In step S730, processor 211 prohibits adjusting the execution order of the target input / output request, maintaining its original order. Thus, processor 211 ensures the correctness of instruction semantics and the integrity of data through the no-boundary-crossing rule. The process then jumps to step S770.
[0114] When the judgment result of step S720 is "no", that is, the adjustment of the execution order of the target input / output request does not violate the no-boundary rule, the process proceeds to step S740. In step S740, the processor 211 determines its maximum transition amplitude based on the feature information of the target input / output request. The maximum transition amplitude refers to the maximum forward or backward movement distance allowed for the input / output request in the execution queue relative to its original position, which is used to limit the range of positional changes caused by the reordering of a single request.
[0115] Specifically, the maximum jump amplitude is determined based on feature information and is limited by a preset reordering window. The reordering window refers to the maximum length of the request sequence considered by the processor 211 when adjusting the execution order; the maximum jump amplitude must not exceed the range of the reordering window. Within the reordering window, the maximum jump amplitude is positively correlated with the priority of the target input / output request, negatively correlated with the size of the target input / output request, and negatively correlated with the physical address order of the target input / output request. The technical implications are: higher priority requests have a more urgent need for response time, thus allowing them to move a greater distance forward in the execution queue; smaller requests occupy less processing resources in the storage device 20, and their insertion into the queue has a smaller impact on other requests, thus allowing them to have a larger jump amplitude; random requests with lower physical address order require more reordering to reduce queuing latency, thus allowing them to have a larger jump amplitude. In one embodiment, the preset value of the reordering window is 16, and the preset value of the maximum jump amplitude is 8.
[0116] Further, in step S750, the processor 211 determines whether the actual adjustment range of the target input / output request exceeds the rearrangement window or the maximum jump range. The actual adjustment range refers to the distance between the adjusted position of the target input / output request in the execution queue and its original position.
[0117] When the determination result of step S750 is "yes," meaning the actual adjustment range exceeds the rearrangement window or the maximum transition range, the process proceeds to step S760. In step S760, processor 211 limits the adjustment range of the target input / output request to the range allowed by the rearrangement window and the maximum transition range. Specifically, processor 211 truncates the actual forward or backward movement distance of the request to the smaller value between the maximum transition range and the rearrangement window. The process then proceeds to step S770.
[0118] When the judgment result of step S750 is "no", that is, the actual adjustment range does not exceed the rearrangement window and the maximum jump range, it means that the current execution order adjustment is within the safety constraints, and the process directly enters step S770.
[0119] In step S770, the processor 211 executes the target input / output request to the memory module 220 via the data management circuit 212 and the memory interface control circuit 213 according to the execution order after the above constraint check.
[0120] Through the dual constraints of the aforementioned no-boundary rule and maximum jump range, the processor 211 optimizes the instruction execution order while ensuring data integrity and the correctness of instruction semantics. Furthermore, by associating the maximum jump range with the characteristic information of the request, the adjustment range of the execution order can be adaptively controlled according to the priority, size, and physical address order of each request.
[0121] Reference Figure 8 , Figure 8 The interaction process among four participants—host system 10, processor 211, buffer memory 214, and memory module 220—is illustrated. The interaction process is divided into two phases: a storage device-side processing phase and a host-side response phase.
[0122] During the storage device-side processing phase, steps S810 to S840 describe the complete process by which the processor 211 receives input / output requests and then feeds back queue depth change information to the host system 10.
[0123] In step S810, the host system 10 sends multiple input / output requests to the processor 211. Specifically, the host system 10 transmits the multiple input / output requests to the memory controller 210 via the data transmission interface circuit 130 and the connection interface circuit 230, and the data management circuit 212 receives and forwards them to the processor 211. This step corresponds to... Figure 2 The step S210 is shown.
[0124] In step S820, the processor 211 extracts feature information from multiple input / output requests and calculates the throughput tendency and latency tendency for each input / output request. For example... Figure 8 As shown, step S820 is the internal operation process of processor 211. Processor 211 follows... Figure 3 The flowcharts shown in steps S310 to S360 involve obtaining request size, physical address, and priority indication information from each input / output request, and calculating throughput tendency and latency tendency using a first weighted combination and a second weighted combination, respectively. Processor 211 writes the calculation results into buffer memory 214 for temporary storage. This step corresponds to... Figure 2 Step S220 is shown.
[0125] Next, in step S821, processor 211 issues a queue depth adjustment instruction to buffer memory 214. Processor 211 then... Figure 4 The flow of steps S410 to S460, as shown, involves performing a dual-threshold determination based on throughput tendency and latency tendency to determine whether the queue depth should be reduced by the first step value, increased by the second step value, or maintained at the current queue depth. The adjusted queue depth value is then updated in the buffer memory 214. This step corresponds to... Figure 2 The step S230 shown.
[0126] In step S822, processor 211 sends an instruction to buffer memory 214 to select a target input / output request. Processor 211 then... Figure 5 The sequence preservation determination and multi-queue allocation process shown are as follows: Figure 6 The execution selection rule shown selects the target input / output request from multiple execution queues (including high-priority queue, latency-sensitive queue, and throughput-sensitive queue) maintained in buffer memory 214.
[0127] In step S823, the buffer memory 214 returns the selected target input / output request to the processor 211. For example... Figure 8 As shown, step S823 is represented by a dashed arrow, indicating the return direction of the response message. After receiving the target input / output request, processor 211 follows... Figure 7 The rearrangement constraint determination process shown checks whether the execution order adjustment of the request satisfies the non-boundary rule and the maximum jump range constraint.
[0128] In step S824, the processor 211 executes a target input / output request to the memory module 220 via the memory interface control circuit 213. Specifically, the processor 211 issues corresponding data read or data write operations to the memory module 220 through the data management circuit 212 and the memory interface control circuit 213, based on the execution order determined after the rearrangement constraint check.
[0129] In step S830, the processor 211 updates the queue depth descriptor of the storage device 20. For example... Figure 8 As shown, step S830 is the internal operation process of processor 211. After completing the queue depth adjustment in step S821, processor 211 writes the adjusted queue depth value into the queue depth descriptor. The queue depth descriptor is stored in buffer memory 214 and is a parameter field in storage device 20 that can be read by host system 10 through storage protocol.
[0130] It is worth mentioning that, in one embodiment, when the storage device 20 and the host system 10 use the UFS communication protocol, the queue depth descriptor corresponds to the bQueueDepth field of the device descriptor in the UFS specification. In related technologies, the bQueueDepth field is typically read by the host system 10 during the initialization phase of the storage device 20 and used as a fixed parameter. In this disclosure, the processor 211 extends the bQueueDepth field to a parameter that can be dynamically updated at runtime. That is, after each adjustment of the queue depth based on throughput and latency tendencies, the processor 211 writes the adjusted queue depth value to the bQueueDepth field, so that the value of this field can reflect the current queue depth state of the storage device 20. In another embodiment, when the storage device 20 and the host system 10 use the NVMe communication protocol, the processor 211 can expose the adjusted queue depth value to the host system 10 through the vendor-specific feature identifier or vendor-specific log page in the NVMe specification.
[0131] In step S840, processor 211 sends a queue depth change notification to host system 10. Specifically, processor 211 notifies host system 10 of a queue depth descriptor change via connection interface circuit 230 and data transmission interface circuit 130 by triggering an asynchronous event notification or hardware interrupt signal. In one embodiment, when storage device 20 and host system 10 use the UFS communication protocol, processor 211 sends a queue depth change notification to host system 10 by setting an exception event status flag in the UPIU response. After detecting that the exception event status flag is set, processor 110 of host system 10 issues a descriptor read request to storage device 20 to obtain the updated bQueueDepth value. In another embodiment, when storage device 20 and host system 10 use the NVMe communication protocol, processor 211 sends a queue depth change notification to host system 10 by completing an asynchronous event request (AER) command pre-submitted by host system 10. After receiving the completion notification of the AER command, the processor 110 of the host system 10 reads the corresponding vendor-defined log page to obtain the updated queue depth value. Furthermore, under the NVMe communication protocol, the processor 211 can also send a hardware interrupt signal to the host system 10 via Message Signaled Interrupt (MSI-X) so that the host system 10 can read the updated queue depth descriptor and adjust the subsequent input / output request delivery strategy accordingly.
[0132] In the host-side response phase, steps S850 to S880 describe the response process of the host system 10 after receiving a queue depth change notification. This phase corresponds to the process by which the host system 10 adjusts its distribution strategy in response to the queue depth change.
[0133] In step S850, the host system 10 sends a request to the processor 211 to read the queue depth descriptor. Specifically, after receiving the queue depth change notification sent in step S840, the processor 110 of the host system 10, in response to the change in the queue depth descriptor, sends a descriptor read request to the memory controller 210 via the data transmission interface circuit 130 and the connection interface circuit 230.
[0134] In step S860, processor 211 returns the updated queue depth value to host system 10. For example... Figure 8As shown, step S860 is indicated by a dashed arrow, representing the return direction of the response message. Processor 211 reads the current queue depth value recorded in the queue depth descriptor from buffer memory 214, and returns the queue depth value to host system 10 via connection interface circuit 230 and data transmission interface circuit 130.
[0135] In step S870, the processor 110 of the host system 10 adjusts the strategy for issuing subsequent input / output requests based on the queue depth value. For example... Figure 8 As shown, step S870 is the internal operation process of the host system 10. The distribution strategy includes adjusting the number of input / output requests simultaneously distributed to the storage device 20 so that it does not exceed the queue depth value. For example, if the queue depth value read by the processor 110 decreases from 24 to 4, the processor 110 will adjust the number of subsequent input / output requests simultaneously distributed to the storage device 20 from 24 to no more than 4, in order to avoid the storage device 20 receiving too many concurrent requests, which would increase the queuing time of each request.
[0136] In step S880, the host system 10 sends subsequent input / output requests to the processor 211 according to the adjusted sending strategy. Thus, the request sending behavior of the host system 10 is consistent with the current queue depth of the storage device 20, forming a complete bidirectional feedback loop. After step S880, the processor 211 will re-execute the processing flow of steps S820 to S840 for newly received input / output requests, realizing the online adaptive adjustment of the storage device 20 to continuously changing load characteristics.
[0137] It is worth mentioning that, in one embodiment, the host system 10 detects changes in the queue depth descriptor in two modes.
[0138] Specifically, the first mode is interrupt-driven mode, where the host system 10 actively initiates the descriptor read operation in step S850 after receiving the asynchronous event notification or hardware interrupt signal sent by the storage device 20 in step S840. This mode has low response latency and is suitable for scenarios where the queue depth changes frequently. The second mode is polling mode, where the processor 110 of the host system 10 actively queries the value of the queue depth descriptor at preset periodic intervals and performs subsequent strategy adjustments when a change in value is detected. Polling mode is relatively simple to implement, but the response latency depends on the polling period setting.
[0139] Through the timing interaction of steps S810 to S880, a complete bidirectional feedback path is established between the storage device 20 and the host system 10: after adjusting the queue depth according to the load characteristics of the input and output requests, the storage device 20 synchronizes the adjustment results with the host system 10 through the queue depth descriptor, and the host system 10 adjusts the number of requests sent accordingly, so that the behavior of both ends remains coordinated and consistent.
[0140] Reference Figure 9 This disclosure also provides an input / output request scheduling method applied to a host system 10 communicatively connected to a storage device 20. The storage device 20 is configured with a memory controller 210 and a memory module 220. The input / output request scheduling method is executed by the processor 110 of the host system 10 and includes steps S910 to S940.
[0141] In step S910, the processor 110 sends multiple input / output requests to the storage device 20. Specifically, in one embodiment, the processor 110 generates multiple input / output requests for the storage device 20 based on the data access requirements generated by the application running on the host system 10, and sends the multiple input / output requests to the connection interface circuit 230 of the storage device 20 via the data transmission interface circuit 130.
[0142] In this embodiment, the processor 110, in its initial state, controls the number of input / output requests simultaneously sent to the storage device 20 according to the currently known queue depth value. After receiving multiple input / output requests, the processor 211 of the storage device 20... Figures 2 to 7 The process shown performs operations such as feature information extraction, dual-target tendency calculation, queue depth adjustment, and instruction execution order optimization for multiple input and output requests.
[0143] Next, in step S920, the processor 110 detects changes in the queue depth descriptor of the storage device 20. The queue depth descriptor is dynamically updated by the processor 211 of the storage device 20 based on the throughput and latency tendencies of multiple input / output requests, and its update process corresponds to... Figure 8 The step S830 shown.
[0144] In one embodiment, the processor 110 detects changes in the queue depth descriptor via an interrupt-driven mode: after receiving an asynchronous event notification or hardware interrupt signal sent by the storage device 20 in step S840, the processor 110 determines that the queue depth descriptor has changed. In another embodiment, the processor 110 detects changes in the queue depth descriptor via a polling mode: the processor 110 actively reads the current value of the queue depth descriptor in the storage device 20 at preset periodic intervals via the data transmission interface circuit 130, and compares it with the previously read value; if the two are different, it determines that the queue depth descriptor has changed.
[0145] It should be noted that, in one embodiment, when the storage device 20 and the host system 10 use the UFS communication protocol, the processor 110 determines whether the queue depth descriptor has changed by detecting the abnormal event status flag in the response UPIU in step S920, and after determining that it has changed, it reads the bQueueDepth field in the device descriptor by sending a query request UPIU to obtain the updated queue depth value. In another embodiment, when the storage device 20 and the host system 10 use the NVMe communication protocol, the processor 110 determines whether the queue depth descriptor has changed by receiving the completion notification of the asynchronous event request command in step S920, and after determining that it has changed, it reads the vendor-customized log page by sending a Get Log Page command to obtain the updated queue depth value. It should be noted that the specific protocol commands and data structures described above are merely illustrative examples. The technical features such as the queue depth descriptor and queue depth change prompt defined in this disclosure are not limited to the implementation of a specific storage protocol. Those skilled in the art can select the corresponding implementation means according to the storage protocol adopted by the storage device 20.
[0146] Further, in step S930, the processor 110 reads the updated queue depth value in response to a change in the queue depth descriptor. Specifically, after detecting a change in the queue depth descriptor in step S920, the processor 110 sends a descriptor read request to the memory controller 210 via the data transmission interface circuit 130 and the connection interface circuit 230. The processor 211 of the storage device 20 reads the current queue depth value recorded by the queue depth descriptor from the buffer memory 214 and returns the queue depth value to the host system 10 via the connection interface circuit 230 and the data transmission interface circuit 130. The processor 110 receives the queue depth value and stores it in the host memory 120 for use in subsequent steps. This interaction process corresponds to... Figure 8 Steps S850 and S860 are shown.
[0147] In step S940, the processor 110 adjusts the strategy for sending subsequent input / output requests based on the queue depth value. The strategy includes adjusting the number of input / output requests sent simultaneously to the storage device 20 to ensure it does not exceed the queue depth value. Specifically, the processor 110 updates the current allowed number of concurrent requests for the storage device 20 recorded in the host memory 120 to the queue depth value read in step S930, and controls the number of requests sent to the storage device 20 within the same time period to not exceed the queue depth value when sending subsequent input / output requests. For example, when the queue depth value read by the processor 110 decreases from 24 to 2, the processor 110 adjusts the number of subsequent simultaneous input / output requests to no more than 2, thereby matching the request sending rate of the host system 10 with the reduced queue depth of the storage device 20 under latency-sensitive load, preventing the storage device 20 from experiencing increased queuing time for each request due to receiving too many concurrent requests. For example, when the queue depth value read by the processor 110 increases from the previous 4 to 24, the processor 110 adjusts the number of subsequent simultaneous input / output requests to no more than 24, so that the storage device 20 can improve its parallel processing capability under throughput-sensitive loads by utilizing the increased queue depth.
[0148] like Figure 9 As shown, the bottom of step S940 is connected to step S910 via a dashed return arrow, indicating that steps S910 to S940 constitute a continuously running loop process. After step S940 is completed, processor 110 continues to send subsequent input / output requests to storage device 20 according to the adjusted sending strategy (i.e., returning to step S910). Processor 211 of storage device 20 re-executes feature information extraction and dual-target evaluation for newly received input / output requests, and updates the queue depth descriptor again when the queue depth is further adjusted. After detecting that the queue depth descriptor has changed again, processor 110 of host system 10 repeats the response process of steps S920 to S940. Thus, a continuously operating bidirectional feedback loop is formed between the host system 10 and the storage device 20: the storage device 20 adjusts the queue depth online according to the load characteristics of input and output requests and feeds back the adjustment results to the host system 10 through the queue depth descriptor; the host system 10 dynamically adjusts the request distribution strategy according to the changes in the queue depth descriptor, so that the request distribution behavior of the host system 10 is always consistent with the current queue depth of the storage device 20.
[0149] Through the cyclic process of steps S910 to S940, the host system 10 can continuously sense the changes in the queue depth of the storage device 20 and dynamically adjust the request delivery strategy, thus avoiding the problem of overload or idle resources of the storage device 20 caused by the host system 10 not knowing the result of the queue depth adjustment and continuing to send requests in the original quantity.
[0150] In one embodiment, the specific operation of the above memory management method is illustrated using a database random read / write scenario as an example. When a database application running on the host system 10 generates a large number of 4KB random read requests, the processor 211 extracts the feature information of these input / output requests in step S310. The obtained request sizes are generally small (size score value close to 0), physical address continuity is low (sequence score value close to 0.1), and due to the characteristics of database front-end queries, the priority score value is at a moderately high level (e.g., 0.6). The latency tendency calculated by the processor 211 in steps S350 and S360 is generally higher than 0.70, while the throughput tendency is generally lower than 0.20. In the determination in step S420, since the latency tendency is greater than or equal to the first preset threshold (0.70) and the difference between the latency tendency and the throughput tendency is greater than or equal to the second preset threshold (0.10), the processor 211 quickly reduces the queue depth to within the first queue depth range (e.g., reduced to 2) in step S430 with an initial value. Meanwhile, in the determination of step S550, since the latency tendency of these requests is higher than the throughput tendency, the processor 211 allocates them to the latency-sensitive queue in step S560, so that they can... Figure 6 The execution selection rules shown provide a relatively high priority for processing. Thus, in database random read / write scenarios, storage device 20 reduces the queuing time for individual 4KB random read requests by decreasing the queue depth and provides a prioritized execution scheduling path by allocating them to latency-sensitive queues.
[0151] In another embodiment, the specific operation of the above memory management method is illustrated using a large file sequential reading scenario as an example. When a video streaming application running on the host system 10 generates a large number of large sequential read requests (e.g., each request is 512KB or larger), the processor 211 extracts the feature information of these input / output requests in step S310. The obtained request sizes are generally large (size score value close to 1.0) and the physical address continuity is high (sequentiality score value close to 0.9). The throughput tendency calculated by the processor 211 in steps S350 and S360 is generally higher than 0.90, while the latency tendency is generally lower than 0.10. In the determination in step S440, since the throughput tendency is greater than or equal to the first preset threshold (0.70) and the difference between the throughput tendency and the latency tendency is greater than or equal to the second preset threshold (0.10), the processor 211 gradually increases the queue depth to within the range of the second queue depth (e.g., to 24) in step S450 with a second step value. Meanwhile, in step S510, since the physical address sequence of these requests meets the preset order condition, the processor 211 maintains its original instruction execution order in step S520 and does not enter the multi-queue allocation and reordering process. Thus, in the scenario of sequential reading of large files, the storage device 20 improves the parallel processing capability by increasing the queue depth and maintains the characteristic of continuous physical address access by maintaining the original execution order of sequential requests.
[0152] In another embodiment, the response process of the memory management method described above when the load characteristics change dynamically is illustrated using a mixed load switching scenario as an example. Initially, the host system 10 continuously sends large sequential write requests to the storage device 20, and the processor 211 maintains the queue depth within a second queue depth range (e.g., 24) based on the throughput tendency. Subsequently, the foreground database query application on the host system 10 begins to generate a large number of 4KB random read requests, and the newly arriving input / output requests are mixed with the previous sequential write requests. During the continuous execution of steps S350 and S360, the processor 211 detects a gradual increase in latency tendency. When the latency tendency increases to meet the trigger condition of step S420, the processor 211 rapidly reduces the queue depth in step S430 with a first step value (e.g., 4). Since the first step value is greater than the second step value, the queue depth decreases rapidly from 24 in increments of 4, and after several rounds of adjustment, it drops to within the first queue depth range (e.g., below 4). After each adjustment, processor 211 updates the queue depth descriptor in step S830 and sends a queue depth change notification to host system 10 in step S840. After detecting a change in the queue depth descriptor in step S920, processor 110 of host system 10 reads the updated queue depth value in step S930 and adjusts the number of simultaneously issued requests from 24 to no more than 4 in step S940. Throughout the transition from throughput-sensitive load to latency-sensitive load, the asymmetric stepping strategy, where the first step value is greater than the second step value, causes the queue depth to decrease faster than it increases, which helps to quickly reduce queuing time when latency-sensitive requests flood in.
[0153] It should be noted that the preset conditions, parameter values, and algorithm selections involved in the above embodiments are all exemplary and do not constitute a limitation on the scope of protection of this disclosure. Specifically, the weight coefficients in the first weighted combination and the second weighted combination (a1=0.35, a2=0.35, a3=0.30 and b1=0.45, b2=0.35, b3=0.20 respectively in the above embodiments) can be adjusted according to the application scenario and load characteristics of the storage device 20. As long as the relationship between the size score and the order score in the first weighted combination participating in the calculation with positive weights and the priority score participating in the calculation with its reverse value remains unchanged, and the relationship between the priority score in the second weighted combination participating in the calculation with positive weights and the size score and the order score participating in the calculation with their respective reverse values remains unchanged, it falls within the scope of implementation of this disclosure. Similarly, the threshold division (32KB and 512KB in the above embodiments), the continuity threshold for determining physical address continuity (128KB in the above embodiments), and the sliding window size in the segmentation mapping rules used when normalizing the request size can be configured according to the physical page size of the memory module 220 and the bus bandwidth characteristics of the storage device 20. The first preset threshold (0.70 in the above embodiments) and the second preset threshold (0.10 in the above embodiments) used to trigger queue depth adjustment, the specific values of the first and second preset values, the upper and lower limits of the first and second queue depth ranges, the preset time threshold for determining timeout escalation (1 millisecond in the above embodiments), the preset number for triggering the anti-starvation mechanism, and the preset values of the rearrangement window and the maximum jump amplitude (16 and 8 respectively in the above embodiments) can all be adaptively configured according to the hardware processing capabilities of the storage device 20, the storage protocol specifications, and the performance requirements of the target application scenario. Furthermore, the specific rules for determining priority based on the historical access patterns of the data streams to which input and output requests belong (taking consecutive access thresholds of 50 and 20 times as examples in the above embodiments) can be replaced by other statistical analysis methods or machine learning inference methods, as long as their output can be mapped to a priority score value between 0 and 1. The specific values and selections of the above parameters and algorithms do not affect the essence of the technical solution provided in this disclosure, and those skilled in the art can adjust and replace them without departing from the principles and scope of the technical solution of this disclosure.
[0154] Based on the descriptions of the above embodiments, the memory management method, input / output request scheduling method, and memory controller provided in this disclosure extract feature information from each input / output request and calculate throughput tendency and latency tendency. This expands the basis for adjusting queue depth from the overall system load level to a two-dimensional tendency evaluation based on each request. This enables the processor to distinguish between latency-sensitive requests and throughput-sensitive requests and perform directional adjustments accordingly. Furthermore, through an asymmetric stepping strategy where the first step value is greater than the second step value and a design where the first queue depth range and the second queue depth range do not overlap, the response time of latency-sensitive requests is prioritized when load characteristics change, and adjustment oscillations caused by the overlap of queue depths between the two scenarios are avoided.
[0155] In terms of instruction execution order optimization, the processor maintains the original execution order of sequential requests through order preservation judgment, and implements differentiated scheduling for non-sequential requests through the allocation of three-level execution queues and execution selection rules. Among them, the timeout priority escalation mechanism and the anti-starvation mechanism ensure that the waiting time of latency-sensitive requests is controllable and the execution opportunity of throughput-sensitive requests is available, respectively. The dual constraints of the no-boundary rule and the maximum jump range ensure data integrity during the execution order adjustment process.
[0156] In terms of collaboration between the storage device and the host system, the processor updates the queue depth descriptor and sends a queue depth change notification to the host system, enabling the host system to adjust the number of requests sent accordingly. This avoids storage device overload or resource idleness caused by the host system being unaware of the queue depth adjustment result.
[0157] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this disclosure, and are not intended to limit them. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of this disclosure.
Claims
1. A memory management method, applied to a storage device configured with a memory controller and a memory module, characterized in that, The method includes: Receive multiple input / output requests from the host system; Extract the feature information of each input / output request from the plurality of input / output requests, and calculate the throughput tendency and latency tendency of each input / output request based on the feature information; The queue depth of the storage device is adjusted based on the throughput tendency and the latency tendency.
2. The method according to claim 1, characterized in that, The feature information includes the request size, physical address, and priority indication information of the input / output request, wherein the step of calculating the throughput tendency and latency tendency of each input / output request based on the feature information includes: The request size is normalized to obtain a size score. The number of consecutive input / output requests is determined based on the physical address, and a sequential score is obtained based on the percentage of consecutive requests. Obtain the priority score value based on the priority indication information of the input / output request; Based on the size score, the order score, and the priority score, the throughput tendency and the delay tendency are calculated respectively.
3. The method according to claim 2, characterized in that, The throughput tendency is calculated based on a first weighted combination of the size score, the order score, and the priority score, wherein the size score and the order score are calculated with positive weights, and the priority score is calculated with its negative weight.
4. The method according to claim 2, characterized in that, The delay tendency is calculated based on a second weighted combination of the size score, the order score, and the priority score, wherein the priority score participates in the calculation with a positive weight, and the size score and the order score participate in the calculation with their respective negative values.
5. The method according to claim 2, characterized in that, The priority score is determined based on at least one of the following: The priority flag carried in the input / output request; and The priority is determined based on the historical access patterns of the data stream to which the input / output request belongs.
6. The method according to claim 1, characterized in that, The step of adjusting the queue depth of the storage device based on the throughput tendency and the latency tendency includes: When the latency tendency is greater than or equal to a first preset threshold and the difference between the latency tendency and the throughput tendency is greater than or equal to a second preset threshold, the queue depth is reduced by an initial value. When the throughput tendency is greater than or equal to the first preset threshold and the difference between the throughput tendency and the latency tendency is greater than or equal to the second preset threshold, the queue depth is increased by a second step value.
7. The method according to claim 6, characterized in that, The adjustment of the queue depth is limited by a preset queue depth range, wherein decreasing the queue depth is limited by a first queue depth range, and increasing the queue depth is limited by a second queue depth range, wherein the lower limit of the second queue depth range is greater than the upper limit of the first queue depth range.
8. The method according to claim 1, characterized in that, The method further includes: After adjusting the queue depth, update the queue depth descriptor of the storage device; By triggering an asynchronous event notification or a hardware interrupt signal, a queue depth change prompt is sent to the host system, so that the host system can read the updated queue depth descriptor and adjust the subsequent input / output request delivery strategy accordingly.
9. The method according to claim 1, characterized in that, The method further includes: For input / output requests whose physical address sequence meets the preset order conditions, their original instruction order is maintained and executed. For input / output requests that do not meet the preset order conditions, they are assigned to multiple execution queues based on the feature information, and the target input / output request is selected and executed from the multiple execution queues according to the preset execution selection rules.
10. The method according to claim 9, characterized in that, The plurality of execution queues includes a high-priority queue, a latency-sensitive queue, and a throughput-sensitive queue; the method further includes: When the priority of the input / output request meets the preset high priority condition, the input / output request is assigned to the high priority queue; When the priority of the input / output request does not meet the high-priority condition and the latency tendency is higher than the throughput tendency, the input / output request is assigned to the latency-sensitive queue; and When the priority of the input / output request does not meet the high priority condition and the throughput tendency is higher than the latency tendency, the input / output request is assigned to the throughput sensitive queue.
11. The method according to claim 10, characterized in that, The execution selection rules include: The target input / output request is selected from the high-priority queue first; When the high-priority queue is empty, it is detected whether there is an input / output request in the delay-sensitive queue whose waiting time exceeds a preset time threshold. If so, the target input / output request is selected from the delay-sensitive queue. When the high-priority queue is empty and there are no input / output requests exceeding the preset time threshold in the delay-sensitive queue, the target input / output request is selected from the throughput-sensitive queue. Specifically, after a preset number of input / output requests are selected and executed consecutively from the high-priority queue or the delay-sensitive queue, at least one input / output request is selected and executed from the throughput-sensitive queue.
12. The method according to claim 9, characterized in that, The adjustment of the execution order of the input and output requests is also subject to the no-boundary rule and the maximum jump range; The non-boundary rules include: not adjusting the execution order of commands that cross barrier categories, and not changing the execution order of read and write requests with data dependencies; The maximum transition amplitude is limited by a preset rearrangement window and is determined based on the feature information. The maximum transition amplitude is positively correlated with the priority of the input / output request, negatively correlated with the request size of the input / output request, and negatively correlated with the physical address order of the input / output request.
13. A memory controller suitable for a storage device configured with a memory module, comprising: A memory interface control circuit is used to electrically connect to the memory module; as well as A processor, electrically connected to the memory interface control circuit, wherein the processor is configured to: Receive multiple input / output requests from the host system; Extract the feature information of each input / output request from the plurality of input / output requests, and calculate the throughput tendency and latency tendency of each input / output request based on the feature information; as well as The queue depth of the storage device is adjusted based on the throughput tendency and the latency tendency.
14. An input / output request scheduling method, applied to a host system communicatively connected to a storage device, characterized in that, The storage device is configured with a memory controller and a memory module, and the method includes: Send multiple input / output requests to the storage device; Detect changes in the queue depth descriptor of the storage device, wherein the queue depth descriptor is dynamically updated by the storage device based on the throughput tendency and latency tendency of the plurality of input / output requests; In response to a change in the queue depth descriptor, the updated queue depth value is read; and Based on the queue depth value, the strategy for sending subsequent input / output requests is adjusted, wherein the strategy includes adjusting the number of input / output requests sent simultaneously to the storage device so that it does not exceed the queue depth value.